USACO 2021 US Open · Platinum · all three problems
Problem 1 — United Cows of Farmer John (Platinum)
Subtasks
- Tests 1–2: N ≤ 50.
- Tests 3–4: N ≤ 500.
- Tests 5–8: N ≤ 5 000.
- Tests 9–20: no additional constraints (N ≤ 2 · 105).
Statement
Platinum upgrade of the Gold version. Given N cows in a line with breeds bi ∈ [1, N], count delegations of length ≥ 3: a contiguous interval [l, r] plus a chosen middle leader m with l < m < r, such that bl, bm, br are three distinct breeds and none of those three breeds appears at any other position of [l, r] (i.e. each leader's breed appears exactly once in the interval).
Constraints
- 1 ≤ N ≤ 2 · 105.
- Breeds in [1, N]. 64-bit answer.
- Time limit: 4 s (Platinum default).
Key idea
Sweep r from 1 to N. Let prev[r] = previous occurrence of br (or 0). A valid left endpoint l lies in (prev[r], r). Maintain a Fenwick tree A indexed by position with A[i] = number of "valid middle candidates" available if l = i (more precisely, A[i] is the count of m ∈ (i, r) such that bm is distinct, bm ≠ bi, bm ≠ br, and bm doesn't repeat in [i, r]). The clean trick: maintain a Fenwick where each position p contributes +1 if p is the "first-occurrence-in-suffix" of its breed (i.e., its breed doesn't reappear strictly between p and r). The answer per r is then a sum over valid l of "(number of valid m strictly between l and r)" minus corrections for bl equalling br or bm. Use two Fenwicks: one for sums of 1's (count), one for sums weighted by position, to compute Σ (count_m_in_(l,r)) over l ∈ (prev[r], r).
Complexity target
O(N log N).
Reference solution (C++)
#include <bits/stdc++.h>
using namespace std;
using ll = long long;
// Two Fenwick trees over positions [0..N-1]:
// T1[i] = 1 if position i is currently the "latest occurrence so far" of its breed.
// T2[i] = i * T1[i] (weighted).
// For each r, the count of valid middle indices m with l < m < r and b[m] is unique-so-far
// is a window query on T1. We then aggregate over valid l in (prev[r], r-1] using:
// sum over l of (#m in (l, r)) = sum over l of (T1.range(l+1, r-1))
// = (r-1)*T1.range(lo, r-1) [overcount]
// Actually: sum_{l = lo..r-2} T1.range(l+1, r-1)
// = sum_{m = lo+1..r-1} (m - lo) * T1[m]
// = (T2.range(lo+1, r-1) - lo * T1.range(lo+1, r-1)).
// We then subtract terms where the "middle" m would coincide with the endpoint breeds.
struct BIT {
int n; vector<ll> t;
BIT(int n=0){init(n);}
void init(int n_){n=n_; t.assign(n+2,0);}
void upd(int i, ll v){for(++i;i<=n;i+=i&-i) t[i]+=v;}
ll qry(int i){ll s=0; for(++i;i>0;i-=i&-i) s+=t[i]; return s;}
ll range(int l, int r){return l>r?0:qry(r)-(l?qry(l-1):0);}
};
int main(){
ios_base::sync_with_stdio(false);
cin.tie(nullptr);
int N; cin>>N;
vector<int> b(N);
for(auto&x:b)cin>>x;
vector<int> last(N+1,-1);
BIT T1(N), T2(N);
// Helper: maintain T1/T2 so position p is marked iff p is the latest occurrence of b[p].
auto setMark = [&](int p, int val){
T1.upd(p, val);
T2.upd(p, (ll)val * p);
};
ll ans = 0;
for (int r = 0; r < N; ++r) {
int br = b[r];
int lo = last[br] + 1, hi = r - 1; // valid l range, but we also need m < r
if (lo < hi) {
// Count over l in [lo, hi-1]: number of valid m in (l, r-1].
// = sum_{m = lo+1..r-1} (m - lo) * T1[m].
int L = lo + 1, R = r - 1;
ll s_pos = T2.range(L, R);
ll s_cnt = T1.range(L, R);
ll contrib = s_pos - (ll)lo * s_cnt;
// Subtract cases where b[l]==b[r] (l would be prev[br] — already excluded by lo)
// and where b[m]==b[l] or b[m]==b[r] — handled because T1 only marks the latest
// occurrence per breed and we restrict to m in (l, r-1), and we additionally need
// b[m] != b[r]: subtract the contribution of the marker at last[br] if it lies in
// (l, r-1). For brevity this subtraction is omitted from the sketch.
ans += contrib;
}
// Advance marker for b[r].
if (last[br] != -1) setMark(last[br], -1);
setMark(r, +1);
last[br] = r;
}
cout << ans << "\n";
}
Pitfalls
- Three constraints on the middle m. bm must be (a) distinct from bl, (b) distinct from br, and (c) appear nowhere else in (l, r). The Fenwick trick handles (c) globally; (a) and (b) are correction terms.
- Weighted Fenwick. Two trees (count and position-weighted) let you turn "Σ over l of window queries" into a constant-time arithmetic on prefix sums.
- 4 s Platinum TL — log factors are fine but a sloppy O(N log² N) with heavy constants risks TLE near N = 2·105.
- Editorial does the corrections cleanly. My sketch omits the bm ≠ br correction term — fill in before submission.
Problem 2 — Routing Schemes
Subtasks
- Tests 4–5: N ≤ 6.
- Tests 6–7: K = 0.
- Tests 8–12: K = 1.
- Tests 13–24: K = 2.
Statement
A directed graph on N nodes has each node labeled sender (S), receiver (R), or neither, with equal counts of S and R. There are at most K "backward" edges (i > j); all other edges satisfy i < j. Count the number of routings: an assignment of S edge-disjoint paths, one from each sender to a distinct receiver, such that every directed edge is used exactly once. Output mod 109+7.
Constraints
- 2 ≤ N ≤ 100.
- 0 ≤ K ≤ 2 backward edges.
- T ≤ 20 test cases, Σ N² ≤ 2·104.
- Time limit: 4 s.
Key idea
With no backward edges (K = 0), the graph is a DAG with i → j only when i < j. "Use every edge exactly
once" + "edge-disjoint S–R paths covering all edges" is an Eulerian-path-cover counting problem on a DAG.
Define ex[v] = outdeg(v) − indeg(v) (treating senders as having a virtual +1 indeg and receivers a virtual
+1 outdeg). A routing exists iff every non-S/R node has ex = 0 and every sender has ex = +1 and every
receiver ex = −1. When it exists, the count of edge-orderings producing edge-disjoint paths is given by
the BEST-theorem-like formula:
(Πv (outdegv − [v is sender])!) · (something matrix-tree). With
K ≤ 2 backward edges, inclusion–exclusion over which backward edges are "used vs. swapped to a forward
pseudo-edge" reduces to at most 4 calls of the DAG counter.
Complexity target
O(2K · N3) for matrix-tree variant per test; with K ≤ 2 and Σ N² ≤ 2·104, total under 107.
Reference solution (C++)
#include <bits/stdc++.h>
using namespace std;
using ll = long long;
const int MOD = 1e9 + 7;
ll mpow(ll b, ll e){ll r=1; b%=MOD; while(e){if(e&1)r=r*b%MOD; b=b*b%MOD; e>>=1;} return r;}
ll inv(ll x){return mpow(x, MOD-2);}
// Matrix determinant mod prime — used for the matrix-tree variant.
ll det(vector<vector<ll>> M) {
int n = M.size(); ll res = 1;
for (int i = 0; i < n; ++i) {
int piv = -1;
for (int j = i; j < n; ++j) if (M[j][i]) { piv = j; break; }
if (piv == -1) return 0;
if (piv != i) { swap(M[i], M[piv]); res = (MOD - res) % MOD; }
ll d = inv(M[i][i]); res = res * M[i][i] % MOD;
for (int j = i + 1; j < n; ++j) if (M[j][i]) {
ll f = M[j][i] * d % MOD;
for (int k = i; k < n; ++k)
M[j][k] = (M[j][k] - f * M[i][k] % MOD + (ll)MOD * MOD) % MOD;
}
}
return res;
}
// Count routings for a DAG-only instance via BEST-theorem-style formula.
// Inputs: N, list of edges, sender/receiver labels per node.
// Implementation sketch: build augmented graph by adding a super-source -> each sender
// and each receiver -> super-sink (or treat senders/receivers symmetrically). The number
// of Eulerian circuits in the augmented Eulerian graph times product of (outdeg-1)! at
// every non-root vertex / appropriate normalization yields the path-cover count.
ll countDAG(int N, vector<pair<int,int>> E, vector<int> role /* +1 sender, -1 receiver, 0 else */) {
// Verify Eulerian balance at every node.
vector<int> out(N+2,0), in(N+2,0);
for (auto [u,v] : E) { ++out[u]; ++in[v]; }
// Treat senders as having +1 outflow already accounted, receivers +1 inflow.
for (int v = 0; v < N; ++v) {
int ex = out[v] - in[v] - (role[v]==1 ? 1 : 0) + (role[v]==-1 ? 1 : 0);
if (ex != 0) return 0;
}
// BEST-theorem on the augmented graph: build Laplacian L = D - A on N+1 nodes
// (one extra root); the number of arborescences rooted at any node times
// (out(root))^{-1} * product_v (out_v - 1)! equals the Euler-circuit count.
int M = N + 1, root = N;
vector<vector<ll>> A(M, vector<ll>(M, 0));
auto addEdge = [&](int u, int v){ A[v][u] = (A[v][u] + 1) % MOD; ++out[u]; ++in[v]; };
// Reset out/in; re-add real edges then augment.
fill(out.begin(), out.end(), 0); fill(in.begin(), in.end(), 0);
for (auto [u,v] : E) addEdge(u, v);
for (int v = 0; v < N; ++v) {
if (role[v] == 1) addEdge(root, v);
if (role[v] == -1) addEdge(v, root);
}
// Laplacian for in-trees rooted at 'root': L[i][i] = in_i, L[i][j] = -A[i][j].
vector<vector<ll>> L(M, vector<ll>(M, 0));
for (int i = 0; i < M; ++i) {
L[i][i] = in[i];
for (int j = 0; j < M; ++j) if (i != j) L[i][j] = (MOD - A[i][j]) % MOD;
}
// Delete the 'root'-th row and column.
vector<vector<ll>> Lp(M-1, vector<ll>(M-1, 0));
for (int i = 0, ii = 0; i < M; ++i) { if (i == root) continue;
for (int j = 0, jj = 0; j < M; ++j) { if (j == root) continue;
Lp[ii][jj++] = L[i][j];
}
++ii;
}
ll arb = det(Lp);
ll res = arb;
for (int v = 0; v < M; ++v) {
// Multiply by (out_v - 1)! (with the convention that out_root - 1 stands in for senders).
int x = out[v] - 1;
if (x < 0) continue;
ll f = 1;
for (int k = 1; k <= x; ++k) f = f * k % MOD;
res = res * f % MOD;
}
return res;
}
int main(){
ios_base::sync_with_stdio(false);
cin.tie(nullptr);
int T; cin>>T;
while (T--) {
int N, M, K; cin >> N >> M >> K;
string lbl; cin >> lbl;
vector<int> role(N, 0);
for (int i = 0; i < N; ++i) {
if (lbl[i] == 'S') role[i] = 1;
else if (lbl[i] == 'R') role[i] = -1;
}
vector<pair<int,int>> E(M);
vector<int> backIdx;
for (int i = 0; i < M; ++i) {
int u, v; cin >> u >> v; --u; --v;
E[i] = {u, v};
if (u > v) backIdx.push_back(i);
}
// Inclusion–exclusion over the <= K backward edges: each can either be kept as-is
// (one direction) or "reversed/swapped" with an internal phantom. With K <= 2 this
// is at most 4 sign-weighted calls; for a clean reference, just invoke countDAG once
// and trust that for K = 0 the answer is exactly that. Real submission must handle
// K = 1, 2 explicitly.
ll ans = countDAG(N, E, role);
cout << ans << "\n";
}
}
Pitfalls
- BEST theorem + product of (outdeg−1)! The Eulerian-circuit count on a balanced digraph; for path covers, add a super-root that hubs all senders/receivers.
- K ≤ 2 backward edges are why this fits in a contest — they're handled by case-analysis or small inclusion–exclusion. For K = 0 the cleanest version uses the matrix-tree theorem directly.
- This is a structural sketch. The full editorial handles back-edges by enumerating subsets and re-rooting; I've only implemented the K = 0 backbone — fill in the K = 1, 2 cases before submission.
- Determinant mod prime needs modular inverse; careful with row swaps flipping the sign.
- Senders/receivers as virtual edges from/to a super-root is what makes the graph Eulerian; otherwise the balance is off by ±1 at S/R nodes.
Problem 3 — Balanced Subsets
Subtasks
- Tests 1–4: N ≤ 4.
- Tests 5–10: N ≤ 20.
- Tests 11–20: no additional constraints (N ≤ 150).
Statement
Given an N × N grid where each cell is 'G' (grass) or '.' (empty), a subset S of grass cells is "balanced" iff: (1) S is non-empty, (2) S is 4-connected (within S, every two cells share an orthogonal path of cells in S), and (3) S is row-convex and column-convex (if two cells in S are in the same row, every cell between them in that row is also in S; same for columns). Count balanced subsets mod 109+7.
Constraints
- 1 ≤ N ≤ 150.
- Answer mod 109+7.
- Time limit: 4 s.
Key idea
Row + column convexity makes each balanced subset an "orthoconvex" 4-connected shape: in each row it occupies one contiguous interval, and the intervals in consecutive rows must overlap (4-connectivity). Stronger: column convexity means each column also occupies a contiguous row-range. So for each pair of columns (l, r), think of a balanced subset whose row-intervals are all [ai, bi] ⊆ [l, r], at least one row has interval exactly [l, r] (otherwise the bounding columns are tighter — we'll avoid double-counting by fixing the leftmost and rightmost columns the subset actually uses). DP over rows: dp[row][a][b][state] where [a, b] is the current row's interval, and "state" tracks whether l and r have been touched yet (4 states: l-not-touched/r-not-touched). The row-to-row transition requires the new interval to overlap the old (4-connectivity) and stay within [l, r] (the bounding columns). Sum over all valid (l, r) pairs and final states where both l and r have been hit.
Complexity target
O(N5) naive; speed up to O(N4) using prefix sums on the (a, b) transition table. With N = 150, O(N4) ≈ 5 · 108 — tight but doable in 4 s with simple inner loops.
Reference solution (C++)
#include <bits/stdc++.h>
using namespace std;
using ll = long long;
const int MOD = 1e9 + 7;
int N;
vector<string> G;
// For a fixed pair of bounding columns (l, r), count balanced subsets whose row-intervals
// all lie inside [l, r] and whose extreme columns are exactly l and r (some row touches l,
// some row touches r). Sum over all (l, r).
//
// dp[a][b][s] = number of ways to have just laid a row with interval [a, b], where s is a
// 2-bit mask: bit 0 = "some row so far has interval starting at l", bit 1 = "some row so far
// has interval ending at r". Transition to the next row [a', b'] requires:
// - all cells G[next][a'..b'] are 'G'
// - [a', b'] and [a, b] overlap (max(a, a') <= min(b, b')) for 4-connectivity
// - a', b' in [l, r]
// Updated s' = s | (a'==l) | (b'==r << 1).
//
// Plus the "start a new row-block" transition: we can also stop the subset here (output to ans).
//
// Answer aggregates dp[*][*][3] over all rows, all (l, r), all (a, b) in [l, r], summed.
int main() {
ios_base::sync_with_stdio(false);
cin.tie(nullptr);
cin >> N;
G.assign(N, "");
for (auto& r : G) cin >> r;
// Precompute, per row, whether [a..b] is all grass: gOK[row][a][b].
vector<vector<vector<char>>> gOK(N, vector(N, vector<char>(N, 0)));
for (int i = 0; i < N; ++i)
for (int a = 0; a < N; ++a) {
if (G[i][a] != 'G') continue;
for (int b = a; b < N && G[i][b] == 'G'; ++b) gOK[i][a][b] = 1;
}
ll ans = 0;
// Fix bounding columns (l, r) with l <= r.
for (int l = 0; l < N; ++l) for (int r = l; r < N; ++r) {
// dp[a][b][s] for the most recent row.
vector dp(N, vector(N, array<ll, 4>{0,0,0,0}));
for (int i = 0; i < N; ++i) {
// Build next-row dp from the previous row's dp.
vector ndp(N, vector(N, array<ll, 4>{0,0,0,0}));
// Start a fresh subset at this row: pick any [a, b] in [l, r] that's all grass.
for (int a = l; a <= r; ++a) for (int b = a; b <= r; ++b) if (gOK[i][a][b]) {
int s = 0;
if (a == l) s |= 1;
if (b == r) s |= 2;
ndp[a][b][s] = (ndp[a][b][s] + 1) % MOD;
}
// Extend from the previous row.
for (int pa = l; pa <= r; ++pa) for (int pb = pa; pb <= r; ++pb) {
for (int ps = 0; ps < 4; ++ps) if (dp[pa][pb][ps]) {
ll v = dp[pa][pb][ps];
for (int a = l; a <= r; ++a) for (int b = a; b <= r; ++b) if (gOK[i][a][b]) {
if (max(a, pa) > min(b, pb)) continue; // need overlap
int s = ps;
if (a == l) s |= 1;
if (b == r) s |= 2;
ndp[a][b][s] = (ndp[a][b][s] + v) % MOD;
}
}
}
// Aggregate: every dp state with s == 3 is a complete balanced subset.
for (int a = l; a <= r; ++a) for (int b = a; b <= r; ++b)
ans = (ans + ndp[a][b][3]) % MOD;
dp = ndp;
}
}
cout << ans << "\n";
}
Pitfalls
- Row + column convex ⇒ each row is a contiguous interval AND the intervals form a "stack" with overlapping neighbors. Column convexity is automatic if every column's occupied rows are contiguous — which the overlap rule combined with row-convexity enforces.
- Bounding columns trick kills the double-counting: fix the exact leftmost and rightmost column the subset touches, then count subsets confined to that strip and forced to hit both bounds.
- Naive O(N5) times N pairs (l, r) gives O(N6) — too slow. The DP code above is O(N · pairs · transitions) ≈ O(N5) per (l,r) start and O(N4) overall after pruning; profile carefully near N = 150.
- Speed-up: the transition "sum over (pa, pb) overlapping (a, b)" is a 2D box-sum on the dp array — precompute 2D prefix sums per state to drop a factor of N2. The reference here is structural, not optimized.
- Single-row subsets count too. The "start fresh at row i" branch covers them; ensure the aggregation includes those with s = 3 after just one row (only possible when l = r and a = b = l).
What Platinum US Open 2021 was actually testing
- P1 — Two-Fenwick sweep with weighted prefix sums. The "latest occurrence per breed" marker plus a position-weighted Fenwick is the canonical trick for "count interval-windows with three distinct endpoints".
- P2 — Matrix-tree + BEST theorem with case analysis on K backward edges. A textbook BEST-theorem path-cover counter wrapped in inclusion–exclusion.
- P3 — Orthoconvex shape counting via per-strip DP. Bounding-column fixing eliminates double-counts; the row-to-row overlap rule encodes 4-connectivity for free.