2024 US Open · Gold

← Back to 2024 US Open · Official results page

Gold this round is a strong mix: a greedy on 0/1 strings with K-bounded swaps, an interval-intersection counting problem that loves coordinate compression + BIT, and a DP-on-partitions problem where the inner loop must be made monotone for full credit. All three reward strong constraint-reading skills.

Paraphrase warning. Verify everything on usaco.org before coding.

G1 · Cowreography

Official statement (cpid=1425)

Statement (paraphrase)

N cows in a row, each Guernsey (0) or Holstein (1). In one move you may swap two cows that are at most K positions apart. Given an initial binary string and a target binary string, find the minimum number of moves to transform start into target. Strings have equal numbers of 0s and 1s.

Constraints & subtasks

2 ≤ N ≤ 10⁶, 1 ≤ K ≤ N−1
Subtasks: K = 1 · ≤ 8 ones · N ≤ 5000 · full
Time limit: 2 s, memory: 256 MB

Key idea

Match the ones in the start to the ones in the target greedily by index (a classic exchange argument shows this is optimal). For each matched pair at distance d, the cost is ⌈d / K⌉ swaps. Implement with a two-pointer pass that queues "ones waiting in start" and "ones waiting in target"; whenever both queues are non-empty, pop the heads and pay ⌈|i − j| / K⌉. The greedy "match in order" is provably optimal because any non-crossing matching can be uncrossed without increasing cost.

Complexity

O(N). One pass over both strings.

Reference C++

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);
    int N, K;
    cin >> N >> K;
    string s, t;
    cin >> s >> t;

    long long cost = 0;
    // Maintain two FIFO queues of pending unmatched positions.
    deque<int> pendS, pendT;
    for (int i = 0; i < N; i++) {
        if (s[i] == '1' && t[i] == '1') continue; // already matched in place
        if (s[i] == '1' && t[i] == '0') {
            // a "1" in s without a target here; either pair with earlier T-need or wait
            if (!pendT.empty()) {
                int j = pendT.front(); pendT.pop_front();
                cost += (i - j + K - 1) / K;
            } else pendS.push_back(i);
        } else if (s[i] == '0' && t[i] == '1') {
            if (!pendS.empty()) {
                int j = pendS.front(); pendS.pop_front();
                cost += (i - j + K - 1) / K;
            } else pendT.push_back(i);
        }
        // s=0,t=0: nothing to do
    }
    cout << cost << "\n";
}

Pitfalls

The naive proof "match nearest pair" can give worse results than "match in order" because every swap between matched pair (i, j) must traverse the j − i gap in chunks of K.
N up to 10⁶ — use long long for cost (can reach ~N²/K).
Don't forget to skip positions where both strings have the same bit.

G2 · Grass Segments

Official statement (cpid=1426)

Statement (paraphrase)

N intervals [ℓᵢ, rᵢ] on the real line, each with a minimum-overlap threshold kᵢ. For each i, count the number of j ≠ i such that |[ℓᵢ, rᵢ] ∩ [ℓⱼ, rⱼ]| ≥ kᵢ.

Constraints & subtasks

2 ≤ N ≤ 2·10⁵, 0 < ℓᵢ < rᵢ ≤ 10⁹, 1 ≤ kᵢ
Subtasks: N ≤ 5000 · uniform k · full
Time limit: 2 s, memory: 256 MB

Key idea

For a query interval (ℓ, r, k), the candidate partner intervals are those (ℓ', r') with min(r, r') − max(ℓ, ℓ') ≥ k. Rewrite this as two simultaneous conditions: r' ≥ ℓ + k AND ℓ' ≤ r − k. So we want to count intervals where r' lies in [ℓ + k, ∞) and ℓ' lies in (−∞, r − k]. This is a 2D dominance counting problem: sort by one coordinate, sweep, BIT (Fenwick) over the other.

Complexity

O(N log N). Coordinate-compress ℓ values, sort events by r', sweep.

Reference C++

#include <bits/stdc++.h>
using namespace std;

struct BIT {
    vector<int> t; int n;
    BIT(int n) : t(n + 2, 0), n(n) {}
    void upd(int i, int v=1) { for (++i; i <= n; i += i & -i) t[i] += v; }
    int qry(int i) { int s = 0; for (++i; i > 0; i -= i & -i) s += t[i]; return s; }
    int rng(int l, int r) { return r < l ? 0 : qry(r) - (l ? qry(l - 1) : 0); }
};

int main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);
    int N;
    cin >> N;
    vector<long long> L(N), R(N), K(N);
    for (int i = 0; i < N; i++) cin >> L[i] >> R[i] >> K[i];

    // Compress ℓ-axis (we need to query "ℓ' ≤ r - k").
    vector<long long> xs(L);
    sort(xs.begin(), xs.end()); xs.erase(unique(xs.begin(), xs.end()), xs.end());
    auto idx = [&](long long v) {
        return (int)(upper_bound(xs.begin(), xs.end(), v) - xs.begin()) - 1; // last ≤ v
    };

    // Two event types:
    //   A) insert interval j: at "time" r_j, add ℓ_j to the BIT.
    //   B) query interval i:  at "time" ℓ_i + k_i - 1 (just before it), count ℓ' ≤ r_i - k_i in BIT,
    //      then subtract intervals already inserted but with r' < ℓ_i + k_i (those don't satisfy r' ≥ ℓ_i + k_i).
    // Easier: process in DECREASING order of r'. Insert j when we've moved past r_j. Query i
    // when current cutoff ≤ ℓ_i + k_i (i.e., all remaining-to-insert have r' >= cutoff).
    // We'll do: sort intervals by r descending; queries by (ℓ_i + k_i) descending.

    vector<int> orderR(N); iota(orderR.begin(), orderR.end(), 0);
    sort(orderR.begin(), orderR.end(), [&](int a, int b){ return R[a] > R[b]; });

    vector<int> orderQ(N); iota(orderQ.begin(), orderQ.end(), 0);
    sort(orderQ.begin(), orderQ.end(), [&](int a, int b){ return L[a] + K[a] > L[b] + K[b]; });

    BIT bit((int)xs.size());
    vector<int> ans(N, 0);
    int p = 0;
    for (int qi : orderQ) {
        long long need = L[qi] + K[qi];
        while (p < N && R[orderR[p]] >= need) {
            bit.upd(idx(L[orderR[p]]));
            p++;
        }
        long long upper = R[qi] - K[qi];
        int hi = idx(upper);
        int cnt = (hi >= 0) ? bit.qry(hi) : 0;
        // Subtract self if it satisfies its own condition (it always does when k_i < r_i - ℓ_i):
        if (R[qi] >= need && L[qi] <= upper) cnt--;
        ans[qi] = cnt;
    }

    for (int i = 0; i < N; i++) cout << ans[i] << "\n";
}

Pitfalls

Self-count: interval i may "match itself"; subtract 1 if so.
The reordering trick is the whole problem — the two-sided constraint becomes a single offline sweep.
Coordinates up to 10⁹ → must compress, but only ℓ-axis needs compression; r is used only as a sort key.

G3 · Smaller Averages

Official statement (cpid=1427)

Statement (paraphrase)

Two arrays a, b of length N. Count the number of ways to choose an integer k ≥ 1 and partition both arrays into k non-empty contiguous parts such that, for every p, the average of a's p-th part is ≤ the average of b's p-th part. Two partitionings are different if k differs or any cut position differs. Output the count mod 10⁹+7.

Constraints & subtasks

1 ≤ N ≤ 500, 1 ≤ aᵢ, bᵢ ≤ 10⁶
Subtasks: N ≤ 10 · N ≤ 80 · N ≤ 300 · N ≤ 500
Time limit: 2 s, memory: 256 MB

Key idea

DP on prefix lengths: f[i][j] = # ways to partition a[0..i) and b[0..j) into matching parts. Transition: pick the previous cut (i', j') with i' < i, j' < j, and the segments a[i'..i) and b[j'..j) satisfy avg(a-segment) ≤ avg(b-segment), i.e. (prefA[i] − prefA[i']) · (j − j') ≤ (prefB[j] − prefB[j']) · (i − i'). Naive O(N⁴) is 6.25·10¹⁰ — too slow. The trick: precompute "valid (i', j') for end (i, j)" using a monotone property, then use 2D prefix sums on f to sum in O(1). Total O(N⁴) reduces to O(N³) with the prefix-sum trick, which is ~1.25·10⁸ at N = 500. Tight but feasible.

Complexity

O(N³) time, O(N²) memory.

Reference C++

#include <bits/stdc++.h>
using namespace std;
const int MOD = 1'000'000'007;

int main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);
    int N;
    cin >> N;
    vector<long long> a(N), b(N);
    for (auto &x : a) cin >> x;
    for (auto &x : b) cin >> x;

    vector<long long> pA(N + 1, 0), pB(N + 1, 0);
    for (int i = 0; i < N; i++) pA[i + 1] = pA[i] + a[i];
    for (int i = 0; i < N; i++) pB[i + 1] = pB[i] + b[i];

    // f[i][j] = number of ways to partition a[0..i) and b[0..j) into matching parts ending exactly there.
    // f[0][0] = 1; answer = f[N][N].
    vector<vector<long long>> f(N + 1, vector<long long>(N + 1, 0));
    f[0][0] = 1;

    // Precompute ok[i'][i][j'][j] is too big. Instead: for fixed (i, j), iterate (i', j')
    // and accumulate. That's O(N^4) = 6.25e10 — too slow for full credit.
    // The accepted approach uses 2D prefix sums of f over rectangles defined by the
    // monotone valid region. Reference here gives the cubic-with-good-constant version
    // suitable for N ≤ ~300; for N=500 you need the further optimization.

    for (int i = 1; i <= N; i++) {
        for (int j = 1; j <= N; j++) {
            long long sum = 0;
            // Sum f[i'][j'] over i' < i, j' < j with avg(a[i'..i)) ≤ avg(b[j'..j)).
            for (int ip = 0; ip < i; ip++) {
                long long la = pA[i] - pA[ip];
                int lenA = i - ip;
                // condition: la * (j - j') ≤ (pB[j] - pB[j']) * lenA
                // Solve for j': pB[j'] ≤ pB[j] - la*(j - j') / lenA. Iterate to keep it simple:
                for (int jp = 0; jp < j; jp++) {
                    long long lb = pB[j] - pB[jp];
                    int lenB = j - jp;
                    if (la * (long long)lenB <= lb * (long long)lenA)
                        sum = (sum + f[ip][jp]) % MOD;
                }
            }
            f[i][j] = sum;
        }
    }
    cout << f[N][N] << "\n";
}

Pitfalls

The la * lenB ≤ lb * lenA cross-multiplication form avoids floating-point error — critical.
Products fit in long long: maximum is N · max(a) · N = 500 · 10⁶ · 500 = 2.5·10¹¹, well below 9·10¹⁸.
The reference here is O(N⁴) and will TLE for N=500. The full-credit version sorts pairs (i', j') by the cross-multiplied threshold and uses a 2D BIT or precomputed 2D prefix sums of f over monotone rectangles.