USACO 2021 US Open · Platinum · all three problems

← Full US Open 2021 set · Official results

Source. Statements and constraints are paraphrased from usaco.org/index.php?page=open21results. Each problem heading links to the canonical statement on usaco.org. Code is my own reference C++ — cross-check the official editorial before trusting it under contest pressure.

Problem 1 — United Cows of Farmer John (Platinum)

Subtasks

Tests 1–2: N ≤ 50.
Tests 3–4: N ≤ 500.
Tests 5–8: N ≤ 5 000.
Tests 9–20: no additional constraints (N ≤ 2 · 10⁵).

Statement

Platinum upgrade of the Gold version. Given N cows in a line with breeds b_i ∈ [1, N], count delegations of length ≥ 3: a contiguous interval [l, r] plus a chosen middle leader m with l < m < r, such that b_l, b_m, b_r are three distinct breeds and none of those three breeds appears at any other position of [l, r] (i.e. each leader's breed appears exactly once in the interval).

Constraints

1 ≤ N ≤ 2 · 10⁵.
Breeds in [1, N]. 64-bit answer.
Time limit: 4 s (Platinum default).

Key idea

Sweep r from 1 to N. Let prev[r] = previous occurrence of b_r (or 0). A valid left endpoint l lies in (prev[r], r). Maintain a Fenwick tree A indexed by position with A[i] = number of "valid middle candidates" available if l = i (more precisely, A[i] is the count of m ∈ (i, r) such that b_m is distinct, b_m ≠ b_i, b_m ≠ b_r, and b_m doesn't repeat in [i, r]). The clean trick: maintain a Fenwick where each position p contributes +1 if p is the "first-occurrence-in-suffix" of its breed (i.e., its breed doesn't reappear strictly between p and r). The answer per r is then a sum over valid l of "(number of valid m strictly between l and r)" minus corrections for b_l equalling b_r or b_m. Use two Fenwicks: one for sums of 1's (count), one for sums weighted by position, to compute Σ (count_m_in_(l,r)) over l ∈ (prev[r], r).

Complexity target

O(N log N).

Reference solution (C++)

#include <bits/stdc++.h>
using namespace std;
using ll = long long;

// Two Fenwick trees over positions [0..N-1]:
//   T1[i] = 1 if position i is currently the "latest occurrence so far" of its breed.
//   T2[i] = i * T1[i] (weighted).
// For each r, the count of valid middle indices m with l < m < r and b[m] is unique-so-far
// is a window query on T1. We then aggregate over valid l in (prev[r], r-1] using:
//    sum over l of (#m in (l, r)) = sum over l of (T1.range(l+1, r-1))
//                                 = (r-1)*T1.range(lo, r-1) [overcount]
// Actually: sum_{l = lo..r-2} T1.range(l+1, r-1)
//         = sum_{m = lo+1..r-1} (m - lo) * T1[m]
//         = (T2.range(lo+1, r-1) - lo * T1.range(lo+1, r-1)).
// We then subtract terms where the "middle" m would coincide with the endpoint breeds.
struct BIT {
    int n; vector<ll> t;
    BIT(int n=0){init(n);}
    void init(int n_){n=n_; t.assign(n+2,0);}
    void upd(int i, ll v){for(++i;i<=n;i+=i&-i) t[i]+=v;}
    ll qry(int i){ll s=0; for(++i;i>0;i-=i&-i) s+=t[i]; return s;}
    ll range(int l, int r){return l>r?0:qry(r)-(l?qry(l-1):0);}
};

int main(){
    ios_base::sync_with_stdio(false);
    cin.tie(nullptr);
    int N; cin>>N;
    vector<int> b(N);
    for(auto&x:b)cin>>x;
    vector<int> last(N+1,-1);
    BIT T1(N), T2(N);

    // Helper: maintain T1/T2 so position p is marked iff p is the latest occurrence of b[p].
    auto setMark = [&](int p, int val){
        T1.upd(p, val);
        T2.upd(p, (ll)val * p);
    };

    ll ans = 0;
    for (int r = 0; r < N; ++r) {
        int br = b[r];
        int lo = last[br] + 1, hi = r - 1;       // valid l range, but we also need m < r
        if (lo < hi) {
            // Count over l in [lo, hi-1]: number of valid m in (l, r-1].
            // = sum_{m = lo+1..r-1} (m - lo) * T1[m].
            int L = lo + 1, R = r - 1;
            ll s_pos = T2.range(L, R);
            ll s_cnt = T1.range(L, R);
            ll contrib = s_pos - (ll)lo * s_cnt;
            // Subtract cases where b[l]==b[r] (l would be prev[br] — already excluded by lo)
            // and where b[m]==b[l] or b[m]==b[r] — handled because T1 only marks the latest
            // occurrence per breed and we restrict to m in (l, r-1), and we additionally need
            // b[m] != b[r]: subtract the contribution of the marker at last[br] if it lies in
            // (l, r-1). For brevity this subtraction is omitted from the sketch.
            ans += contrib;
        }
        // Advance marker for b[r].
        if (last[br] != -1) setMark(last[br], -1);
        setMark(r, +1);
        last[br] = r;
    }
    cout << ans << "\n";
}

Pitfalls

Three constraints on the middle m. b_m must be (a) distinct from b_l, (b) distinct from b_r, and (c) appear nowhere else in (l, r). The Fenwick trick handles (c) globally; (a) and (b) are correction terms.
Weighted Fenwick. Two trees (count and position-weighted) let you turn "Σ over l of window queries" into a constant-time arithmetic on prefix sums.
4 s Platinum TL — log factors are fine but a sloppy O(N log² N) with heavy constants risks TLE near N = 2·10⁵.
Editorial does the corrections cleanly. My sketch omits the b_m ≠ b_r correction term — fill in before submission.

Problem 2 — Routing Schemes

Subtasks

Tests 4–5: N ≤ 6.
Tests 6–7: K = 0.
Tests 8–12: K = 1.
Tests 13–24: K = 2.

Statement

A directed graph on N nodes has each node labeled sender (S), receiver (R), or neither, with equal counts of S and R. There are at most K "backward" edges (i > j); all other edges satisfy i < j. Count the number of routings: an assignment of S edge-disjoint paths, one from each sender to a distinct receiver, such that every directed edge is used exactly once. Output mod 10⁹+7.

Constraints

2 ≤ N ≤ 100.
0 ≤ K ≤ 2 backward edges.
T ≤ 20 test cases, Σ N² ≤ 2·10⁴.
Time limit: 4 s.

Key idea

With no backward edges (K = 0), the graph is a DAG with i → j only when i < j. "Use every edge exactly once" + "edge-disjoint S–R paths covering all edges" is an Eulerian-path-cover counting problem on a DAG. Define ex[v] = outdeg(v) − indeg(v) (treating senders as having a virtual +1 indeg and receivers a virtual +1 outdeg). A routing exists iff every non-S/R node has ex = 0 and every sender has ex = +1 and every receiver ex = −1. When it exists, the count of edge-orderings producing edge-disjoint paths is given by the BEST-theorem-like formula: (Π_v (outdeg_v − [v is sender])!) · (something matrix-tree). With K ≤ 2 backward edges, inclusion–exclusion over which backward edges are "used vs. swapped to a forward pseudo-edge" reduces to at most 4 calls of the DAG counter.

Complexity target

O(2^K · N³) for matrix-tree variant per test; with K ≤ 2 and Σ N² ≤ 2·10⁴, total under 10⁷.

Reference solution (C++)

#include <bits/stdc++.h>
using namespace std;
using ll = long long;
const int MOD = 1e9 + 7;

ll mpow(ll b, ll e){ll r=1; b%=MOD; while(e){if(e&1)r=r*b%MOD; b=b*b%MOD; e>>=1;} return r;}
ll inv(ll x){return mpow(x, MOD-2);}

// Matrix determinant mod prime — used for the matrix-tree variant.
ll det(vector<vector<ll>> M) {
    int n = M.size(); ll res = 1;
    for (int i = 0; i < n; ++i) {
        int piv = -1;
        for (int j = i; j < n; ++j) if (M[j][i]) { piv = j; break; }
        if (piv == -1) return 0;
        if (piv != i) { swap(M[i], M[piv]); res = (MOD - res) % MOD; }
        ll d = inv(M[i][i]); res = res * M[i][i] % MOD;
        for (int j = i + 1; j < n; ++j) if (M[j][i]) {
            ll f = M[j][i] * d % MOD;
            for (int k = i; k < n; ++k)
                M[j][k] = (M[j][k] - f * M[i][k] % MOD + (ll)MOD * MOD) % MOD;
        }
    }
    return res;
}

// Count routings for a DAG-only instance via BEST-theorem-style formula.
// Inputs: N, list of edges, sender/receiver labels per node.
// Implementation sketch: build augmented graph by adding a super-source -> each sender
// and each receiver -> super-sink (or treat senders/receivers symmetrically). The number
// of Eulerian circuits in the augmented Eulerian graph times product of (outdeg-1)! at
// every non-root vertex / appropriate normalization yields the path-cover count.
ll countDAG(int N, vector<pair<int,int>> E, vector<int> role /* +1 sender, -1 receiver, 0 else */) {
    // Verify Eulerian balance at every node.
    vector<int> out(N+2,0), in(N+2,0);
    for (auto [u,v] : E) { ++out[u]; ++in[v]; }
    // Treat senders as having +1 outflow already accounted, receivers +1 inflow.
    for (int v = 0; v < N; ++v) {
        int ex = out[v] - in[v] - (role[v]==1 ? 1 : 0) + (role[v]==-1 ? 1 : 0);
        if (ex != 0) return 0;
    }
    // BEST-theorem on the augmented graph: build Laplacian L = D - A on N+1 nodes
    // (one extra root); the number of arborescences rooted at any node times
    // (out(root))^{-1} * product_v (out_v - 1)! equals the Euler-circuit count.
    int M = N + 1, root = N;
    vector<vector<ll>> A(M, vector<ll>(M, 0));
    auto addEdge = [&](int u, int v){ A[v][u] = (A[v][u] + 1) % MOD; ++out[u]; ++in[v]; };
    // Reset out/in; re-add real edges then augment.
    fill(out.begin(), out.end(), 0); fill(in.begin(), in.end(), 0);
    for (auto [u,v] : E) addEdge(u, v);
    for (int v = 0; v < N; ++v) {
        if (role[v] == 1) addEdge(root, v);
        if (role[v] == -1) addEdge(v, root);
    }
    // Laplacian for in-trees rooted at 'root': L[i][i] = in_i, L[i][j] = -A[i][j].
    vector<vector<ll>> L(M, vector<ll>(M, 0));
    for (int i = 0; i < M; ++i) {
        L[i][i] = in[i];
        for (int j = 0; j < M; ++j) if (i != j) L[i][j] = (MOD - A[i][j]) % MOD;
    }
    // Delete the 'root'-th row and column.
    vector<vector<ll>> Lp(M-1, vector<ll>(M-1, 0));
    for (int i = 0, ii = 0; i < M; ++i) { if (i == root) continue;
        for (int j = 0, jj = 0; j < M; ++j) { if (j == root) continue;
            Lp[ii][jj++] = L[i][j];
        }
        ++ii;
    }
    ll arb = det(Lp);
    ll res = arb;
    for (int v = 0; v < M; ++v) {
        // Multiply by (out_v - 1)!  (with the convention that out_root - 1 stands in for senders).
        int x = out[v] - 1;
        if (x < 0) continue;
        ll f = 1;
        for (int k = 1; k <= x; ++k) f = f * k % MOD;
        res = res * f % MOD;
    }
    return res;
}

int main(){
    ios_base::sync_with_stdio(false);
    cin.tie(nullptr);
    int T; cin>>T;
    while (T--) {
        int N, M, K; cin >> N >> M >> K;
        string lbl; cin >> lbl;
        vector<int> role(N, 0);
        for (int i = 0; i < N; ++i) {
            if (lbl[i] == 'S') role[i] = 1;
            else if (lbl[i] == 'R') role[i] = -1;
        }
        vector<pair<int,int>> E(M);
        vector<int> backIdx;
        for (int i = 0; i < M; ++i) {
            int u, v; cin >> u >> v; --u; --v;
            E[i] = {u, v};
            if (u > v) backIdx.push_back(i);
        }
        // Inclusion–exclusion over the <= K backward edges: each can either be kept as-is
        // (one direction) or "reversed/swapped" with an internal phantom. With K <= 2 this
        // is at most 4 sign-weighted calls; for a clean reference, just invoke countDAG once
        // and trust that for K = 0 the answer is exactly that. Real submission must handle
        // K = 1, 2 explicitly.
        ll ans = countDAG(N, E, role);
        cout << ans << "\n";
    }
}

Pitfalls

BEST theorem + product of (outdeg−1)! The Eulerian-circuit count on a balanced digraph; for path covers, add a super-root that hubs all senders/receivers.
K ≤ 2 backward edges are why this fits in a contest — they're handled by case-analysis or small inclusion–exclusion. For K = 0 the cleanest version uses the matrix-tree theorem directly.
This is a structural sketch. The full editorial handles back-edges by enumerating subsets and re-rooting; I've only implemented the K = 0 backbone — fill in the K = 1, 2 cases before submission.
Determinant mod prime needs modular inverse; careful with row swaps flipping the sign.
Senders/receivers as virtual edges from/to a super-root is what makes the graph Eulerian; otherwise the balance is off by ±1 at S/R nodes.

Problem 3 — Balanced Subsets

Subtasks

Tests 1–4: N ≤ 4.
Tests 5–10: N ≤ 20.
Tests 11–20: no additional constraints (N ≤ 150).

Statement

Given an N × N grid where each cell is 'G' (grass) or '.' (empty), a subset S of grass cells is "balanced" iff: (1) S is non-empty, (2) S is 4-connected (within S, every two cells share an orthogonal path of cells in S), and (3) S is row-convex and column-convex (if two cells in S are in the same row, every cell between them in that row is also in S; same for columns). Count balanced subsets mod 10⁹+7.

Constraints

1 ≤ N ≤ 150.
Answer mod 10⁹+7.
Time limit: 4 s.

Key idea

Row + column convexity makes each balanced subset an "orthoconvex" 4-connected shape: in each row it occupies one contiguous interval, and the intervals in consecutive rows must overlap (4-connectivity). Stronger: column convexity means each column also occupies a contiguous row-range. So for each pair of columns (l, r), think of a balanced subset whose row-intervals are all [a_i, b_i] ⊆ [l, r], at least one row has interval exactly [l, r] (otherwise the bounding columns are tighter — we'll avoid double-counting by fixing the leftmost and rightmost columns the subset actually uses). DP over rows: dp[row][a][b][state] where [a, b] is the current row's interval, and "state" tracks whether l and r have been touched yet (4 states: l-not-touched/r-not-touched). The row-to-row transition requires the new interval to overlap the old (4-connectivity) and stay within [l, r] (the bounding columns). Sum over all valid (l, r) pairs and final states where both l and r have been hit.

Complexity target

O(N⁵) naive; speed up to O(N⁴) using prefix sums on the (a, b) transition table. With N = 150, O(N⁴) ≈ 5 · 10⁸ — tight but doable in 4 s with simple inner loops.

Reference solution (C++)

#include <bits/stdc++.h>
using namespace std;
using ll = long long;
const int MOD = 1e9 + 7;

int N;
vector<string> G;

// For a fixed pair of bounding columns (l, r), count balanced subsets whose row-intervals
// all lie inside [l, r] and whose extreme columns are exactly l and r (some row touches l,
// some row touches r). Sum over all (l, r).
//
// dp[a][b][s] = number of ways to have just laid a row with interval [a, b], where s is a
// 2-bit mask: bit 0 = "some row so far has interval starting at l", bit 1 = "some row so far
// has interval ending at r". Transition to the next row [a', b'] requires:
//   - all cells G[next][a'..b'] are 'G'
//   - [a', b'] and [a, b] overlap (max(a, a') <= min(b, b')) for 4-connectivity
//   - a', b' in [l, r]
// Updated s' = s | (a'==l) | (b'==r << 1).
//
// Plus the "start a new row-block" transition: we can also stop the subset here (output to ans).
//
// Answer aggregates dp[*][*][3] over all rows, all (l, r), all (a, b) in [l, r], summed.

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(nullptr);
    cin >> N;
    G.assign(N, "");
    for (auto& r : G) cin >> r;

    // Precompute, per row, whether [a..b] is all grass: gOK[row][a][b].
    vector<vector<vector<char>>> gOK(N, vector(N, vector<char>(N, 0)));
    for (int i = 0; i < N; ++i)
      for (int a = 0; a < N; ++a) {
        if (G[i][a] != 'G') continue;
        for (int b = a; b < N && G[i][b] == 'G'; ++b) gOK[i][a][b] = 1;
      }

    ll ans = 0;
    // Fix bounding columns (l, r) with l <= r.
    for (int l = 0; l < N; ++l) for (int r = l; r < N; ++r) {
        // dp[a][b][s] for the most recent row.
        vector dp(N, vector(N, array<ll, 4>{0,0,0,0}));
        for (int i = 0; i < N; ++i) {
            // Build next-row dp from the previous row's dp.
            vector ndp(N, vector(N, array<ll, 4>{0,0,0,0}));
            // Start a fresh subset at this row: pick any [a, b] in [l, r] that's all grass.
            for (int a = l; a <= r; ++a) for (int b = a; b <= r; ++b) if (gOK[i][a][b]) {
                int s = 0;
                if (a == l) s |= 1;
                if (b == r) s |= 2;
                ndp[a][b][s] = (ndp[a][b][s] + 1) % MOD;
            }
            // Extend from the previous row.
            for (int pa = l; pa <= r; ++pa) for (int pb = pa; pb <= r; ++pb) {
                for (int ps = 0; ps < 4; ++ps) if (dp[pa][pb][ps]) {
                    ll v = dp[pa][pb][ps];
                    for (int a = l; a <= r; ++a) for (int b = a; b <= r; ++b) if (gOK[i][a][b]) {
                        if (max(a, pa) > min(b, pb)) continue;       // need overlap
                        int s = ps;
                        if (a == l) s |= 1;
                        if (b == r) s |= 2;
                        ndp[a][b][s] = (ndp[a][b][s] + v) % MOD;
                    }
                }
            }
            // Aggregate: every dp state with s == 3 is a complete balanced subset.
            for (int a = l; a <= r; ++a) for (int b = a; b <= r; ++b)
                ans = (ans + ndp[a][b][3]) % MOD;
            dp = ndp;
        }
    }
    cout << ans << "\n";
}

Pitfalls

Row + column convex ⇒ each row is a contiguous interval AND the intervals form a "stack" with overlapping neighbors. Column convexity is automatic if every column's occupied rows are contiguous — which the overlap rule combined with row-convexity enforces.
Bounding columns trick kills the double-counting: fix the exact leftmost and rightmost column the subset touches, then count subsets confined to that strip and forced to hit both bounds.
Naive O(N⁵) times N pairs (l, r) gives O(N⁶) — too slow. The DP code above is O(N · pairs · transitions) ≈ O(N⁵) per (l,r) start and O(N⁴) overall after pruning; profile carefully near N = 150.
Speed-up: the transition "sum over (pa, pb) overlapping (a, b)" is a 2D box-sum on the dp array — precompute 2D prefix sums per state to drop a factor of N². The reference here is structural, not optimized.
Single-row subsets count too. The "start fresh at row i" branch covers them; ensure the aggregation includes those with s = 3 after just one row (only possible when l = r and a = b = l).

What Platinum US Open 2021 was actually testing

P1 — Two-Fenwick sweep with weighted prefix sums. The "latest occurrence per breed" marker plus a position-weighted Fenwick is the canonical trick for "count interval-windows with three distinct endpoints".
P2 — Matrix-tree + BEST theorem with case analysis on K backward edges. A textbook BEST-theorem path-cover counter wrapped in inclusion–exclusion.
P3 — Orthoconvex shape counting via per-strip DP. Bounding-column fixing eliminates double-counts; the row-to-row overlap rule encodes 4-connectivity for free.