USACO 2024 December · Platinum · all three problems

← Full December 2024 set · Official results

Source. Titles, constraints, and subtask structures are from usaco.org/index.php?page=dec24results. Platinum approaches below are sketches at the level needed to plan a solve — the official editorials on usaco.org are the canonical reference.

Problem 1 — All Pairs Similarity

Statement

Each of N cows owns a K-bit string (interpret as a subset of a K-element universe). For every cow i, compute the sum over j of the Jaccard similarity J(i, j) = |s_i ∩ s_j| / |s_i ∪ s_j|, then output that sum modulo a prime. K is small (≤ 20).

Constraints

1 ≤ N ≤ 5×10⁵; 1 ≤ K ≤ 20; bitstrings are non-zero.
Memory: 512 MB (twice default).
Time limit: 2s (standard Platinum).

Subtask structure

Two test cases per K ∈ {10, 15, 16, 17, 18, 19, 20} (inputs 2–15).
Partial scoring rewards solutions that handle small K but degrade on K=20.

Key idea

There are at most 2^K distinct masks. Group cows by mask: c[m] = number of cows with mask m. For each cow with mask m, the answer is Σ_m' c[m'] · popcount(m ∧ m') / popcount(m ∨ m'). Naively this is 4^K per query. Trick: for each unordered pair (a = popcount(m ∩ m'), b = popcount(m ∪ m')), the contribution factor a / b is a fixed rational — so split by (a, b). Use subset-sum (SOS) DP to compute, for each mask m and each target intersection size a, Σ_{m' ⊇ a-bit subset of m} c[m']; do the analogous super-set DP for union size. Combine with modular inverses of b. Total: O(K² · 2^K) which at K=20 is ~4·10⁸ — tight but possible inside 2s with bit tricks; the 512 MB memory budget allows two K-indexed arrays of size 2^K.

Complexity target

O(K² · 2^K) time, O(K · 2^K) memory.

Reference solution (C++, sketch)

#include <bits/stdc++.h>
using namespace std;
using ll = long long;
const ll MOD = 1'000'000'007;

ll modpow(ll a, ll e, ll m) {
    ll r = 1 % m; a %= m;
    while (e) { if (e & 1) r = r * a % m; a = a * a % m; e >>= 1; }
    return r;
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(nullptr);
    int N, K; cin >> N >> K;
    int M = 1 << K;
    vector<int> cnt(M, 0);
    vector<int> masks(N);
    for (int i = 0; i < N; ++i) { cin >> masks[i]; cnt[masks[i]]++; }

    // SOS subset-sum DP: subCnt[m] = sum of cnt[m'] over m' subset of m
    vector<ll> subCnt(cnt.begin(), cnt.end());
    for (int bit = 0; bit < K; ++bit)
        for (int m = 0; m < M; ++m)
            if (m & (1 << bit)) subCnt[m] += subCnt[m ^ (1 << bit)];

    // Super-set DP: supCnt[m] = sum of cnt[m'] over m' superset of m
    vector<ll> supCnt(cnt.begin(), cnt.end());
    for (int bit = 0; bit < K; ++bit)
        for (int m = 0; m < M; ++m)
            if (!(m & (1 << bit))) supCnt[m] += supCnt[m | (1 << bit)];

    // Precompute modular inverses of 1..K
    vector<ll> inv(K + 1, 1);
    for (int i = 2; i <= K; ++i) inv[i] = modpow(i, MOD - 2, MOD);

    // For each cow i with mask m, accumulate answer.
    // ans(m) = sum over j of popcount(m & m_j) / popcount(m | m_j) (mod p).
    // Decompose by enumerating intersection mask t subset of m and union
    // mask u superset of m. Number of cows with mask exactly r such that
    // t = m & r and u = m | r is determined by r = t | (u \ m).
    // Implementation outline (full version requires careful inclusion-exclusion):
    for (int i = 0; i < N; ++i) {
        int m = masks[i];
        ll s = 0;
        // Iterate over possible intersection sizes a and union sizes b.
        // For each (a, b), the number of j's contributing equals
        // (# cows with mask r where |r & m| = a, |r | m| = b),
        // which is computed via inclusion-exclusion on subCnt/supCnt.
        // [verify] — exact derivation depends on the editorial; this sketch
        // is the right algorithmic skeleton, not a finished submission.
        cout << s << "\n";
    }
}

[verify] This is a deliberate sketch — the inclusion-exclusion step over (a, b) is the heart of the problem and the official editorial at usaco.org gives the exact recurrence. Treat the SOS / super-set scaffolding as confirmed; the inner accounting needs the editorial.

Pitfalls

Memory. Two 2²⁰ longs = 16 MB — fits in 512 MB easily, but careless K-indexed tables of size K · 2^K can blow the budget at K=20.
Modular inverse of denominators. The denominator b can be 1..K; precompute inverses once.
SOS direction. Subset DP iterates bits outer, masks inner; reversing the loops is wrong.

Problem 2 — It's Mooin' Time (Platinum)

Statement

A string of M's and O's of length N is given, with a flip cost c_i per position. A "moo of length L" is an M followed by L−1 O's. For each k from 1 to ⌊N / L⌋, compute the minimum total flip cost needed so the resulting string contains at least k non-overlapping moo occurrences.

Constraints

N ≤ 3×10⁵; L ∈ {1, 2, 3}; 1 ≤ c_i ≤ 10⁸.
Time limit: 3s (1.5× default).

Subtask structure

Input 5: L=3, N ≤ 5000 (brute DP allowed).
Input 6: L=1 (trivial — every M counts as a moo, sort costs).
Inputs 7–10: L=2.
Inputs 11–18: L=3, full constraints.

Key idea

Fix L. Define f(k) = minimum cost for ≥ k moos. The function k ↦ f(k) is convex. Standard tool: Aliens / Lagrangian binary search — binary search λ ≥ 0, solve the "place as many non-overlapping moos as you want, paying their cost minus a bonus of λ per moo" DP in O(N), and extract k(λ) and value(λ). The unconstrained DP is a simple 1D recurrence with state "last position covered." Recover f(k) for all k by sweeping λ and using the standard slope trick.

Complexity target

O(N log V) where V is the value range; subtask inputs allow O(N · k) brute-force.

Reference solution (C++, L=3 framework)

#include <bits/stdc++.h>
using namespace std;
using ll = long long;

int N, L;
string s;
vector<ll> c;

// cost(i) = cost to make a moo of length L starting at i.
ll mooCost(int i) {
    ll x = 0;
    if (s[i] != 'M') x += c[i];
    for (int k = 1; k < L; ++k) if (s[i+k] != 'O') x += c[i+k];
    return x;
}

// Given a Lagrange penalty 'lambda' subtracted per placed moo, find the
// minimum value of (totalFlipCost - lambda * numMoos) using O(N) DP.
// Returns (bestValue, numMoosUsedAtThatValue).
pair<ll,int> solve(ll lambda) {
    // dp[i] = (best value over prefix of length i, moos used to achieve it)
    vector<pair<ll,int>> dp(N + 1);
    dp[0] = {0, 0};
    for (int i = 1; i <= N; ++i) {
        dp[i] = dp[i - 1]; // skip position i-1
        if (i >= L) {
            ll cand = dp[i - L].first + mooCost(i - L) - lambda;
            int mcnt = dp[i - L].second + 1;
            if (cand < dp[i].first ||
                (cand == dp[i].first && mcnt > dp[i].second))
                dp[i] = {cand, mcnt};
        }
    }
    return dp[N];
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(nullptr);
    cin >> N >> L >> s;
    c.assign(N, 0);
    for (auto& v : c) cin >> v;

    int maxK = N / L;
    vector<ll> f(maxK + 1, 0);

    // Binary search the smallest lambda such that solve(lambda).second < k.
    // For each k, recover f(k) = bestValue + lambda * k.
    for (int k = 1; k <= maxK; ++k) {
        ll lo = 0, hi = (ll)1e15;
        while (lo < hi) {
            ll mid = lo + (hi - lo) / 2;
            auto [val, used] = solve(mid);
            if (used >= k) lo = mid + 1; else hi = mid;
        }
        // At lambda = lo - 1 (or lo), recover f(k).
        auto [val, used] = solve(lo);
        f[k] = val + lo * k;
        cout << f[k] << (k == maxK ? '\n' : ' ');
    }
}

[verify] The Aliens-trick details (whether to break ties on used moos high or low, whether λ ranges over integers or reals) need the editorial; the per-k binary search above is O(N log V · k), which is O(N² log V) total and too slow for N=3·10⁵. The intended solution computes f(·) for all k in a single O(N log V) sweep using the convex-hull / slope-trick recovery. The DP body and Lagrange framing are correct.

Pitfalls

Non-overlap. The DP must transition from i − L, not from i − 1, when placing.
Tie-breaking in the Lagrangian DP. Always pick the transition that uses more moos at equal value — this is what makes the binary search converge to f(k).
L = 1 is degenerate. Every M is a moo; answer is the smallest k flip-costs among non-M positions.
Output format. "Output ⌊N/L⌋ values" — emit all of them, not just f(N/L).

Problem 3 — Maximize Minimum Difference

Statement

Among permutations of 1..N, let f(π) = min_i |π_i+1 − π_i|. Let M* be the maximum of f over all permutations. K positional pins are given of the form "position p must hold value v." Count the number of permutations that simultaneously achieve f = M* and obey all pins, modulo 10⁹+7.

Constraints

2 ≤ N ≤ 2000; 0 ≤ K ≤ N; sum across tests T·N ≤ 2×10⁴.
Time limit: 4s (2× default).

Subtask structure

Tier on small N (≤ 15) → brute permutation enumeration.
Tier with no pins or with pins only at endpoints.
Full N ≤ 2000.

Key idea

Step 1: M* = ⌈(N − 1) / 2⌉ — the optimum is achieved by interleaving the small half and the large half (e.g. for N = 6, permutations like 1, 4, 2, 5, 3, 6 hit gap = 3). Step 2: the set of optimal permutations is a controlled "weaving" family parameterised by where the split point sits and a few binary choices at the ends. Step 3: counting with pins reduces to a DP over positions 1..N where the state tracks (a) which half (lower or upper) the next value comes from and (b) progress through the lower / upper run. Pins act as forced state transitions; reject if any pin is inconsistent with any optimum.

Complexity target

O(N²) per test (states × transitions), fine at N = 2000.

Reference solution (C++, framework)

#include <bits/stdc++.h>
using namespace std;
using ll = long long;
const ll MOD = 1'000'000'007;

int N, K;
vector<int> pinPos, pinVal;
// pinAt[p] = v if position p is pinned to v, else 0
vector<int> pinAt;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(nullptr);
    int T; cin >> T;
    while (T--) {
        cin >> N >> K;
        pinAt.assign(N + 1, 0);
        bool feasible = true;
        for (int i = 0; i < K; ++i) {
            int p, v; cin >> p >> v;
            if (pinAt[p] && pinAt[p] != v) feasible = false;
            pinAt[p] = v;
        }
        if (!feasible) { cout << 0 << "\n"; continue; }

        int Mstar = (N - 1 + 1) / 2; // = ceil((N-1)/2)
        int lowHalf = (N + 1) / 2;   // values 1..lowHalf are the "low" group

        // dp[i][lo][hi] = ways to fill positions 1..i where 'lo' low values
        // and 'hi' high values have been placed, last placed = low/high
        // implicit from (lo + hi - 1) parity in optimal weave.
        // For N=2000 we need O(N^2) states; collapse the parity to a single bit.
        // dp[lo][hi][lastSide] is O(N^2 * 2) ~ 1.6e7.
        vector<vector<array<ll,2>>> dp(
            lowHalf + 1, vector<array<ll,2>>(N - lowHalf + 1, {0, 0}));
        // Start with first position empty; the two starting cases are
        // "first is low" or "first is high".
        auto place = [&](int pos, int lo, int hi, int side, ll ways) {
            int val = (side == 0) ? lo : (lowHalf + hi);
            if (pinAt[pos] && pinAt[pos] != val) return ll(0);
            return ways;
        };
        // initial state: position 1 placed.
        dp[1][0][0] = place(1, 1, 0, 0, 1);
        dp[0][1][1] = place(1, 0, 1, 1, 1);

        for (int pos = 2; pos <= N; ++pos) {
            vector<vector<array<ll,2>>> nx(
                lowHalf + 1, vector<array<ll,2>>(N - lowHalf + 1, {0, 0}));
            for (int lo = 0; lo <= lowHalf; ++lo)
                for (int hi = 0; hi <= N - lowHalf; ++hi)
                    for (int last = 0; last < 2; ++last)
                        if (dp[lo][hi][last]) {
                            // Place the next value: must come from the OTHER
                            // side to maintain the gap = ceil((N-1)/2) weave.
                            int side = 1 - last;
                            int nlo = lo + (side == 0 ? 1 : 0);
                            int nhi = hi + (side == 1 ? 1 : 0);
                            if (nlo > lowHalf || nhi > N - lowHalf) continue;
                            ll w = place(pos, nlo, nhi, side, dp[lo][hi][last]);
                            nx[nlo][nhi][side] = (nx[nlo][nhi][side] + w) % MOD;
                        }
            dp = move(nx);
        }
        ll ans = (dp[lowHalf][N - lowHalf][0] + dp[lowHalf][N - lowHalf][1]) % MOD;
        cout << ans << "\n";
    }
}

[verify] The "strict alternation between low and high halves" claim characterises one family of optima but the actual optimal family includes additional reflections, especially when N is even vs odd. Use this as a working skeleton; cross-check the editorial for the exact structural theorem (M* and the count of optimal permutations without pins).

Pitfalls

Identifying M*. Off-by-one in ⌈(N−1)/2⌉ — verify on N = 2, 3, 4 by hand.
Pins outside the optimal family. If any pin violates the alternation rule, the answer is 0; check eagerly.
Memory. N² · 2 longs at N = 2000 is 32 MB — comfortable; don't redundantly allocate.
Edge case N = 2. Both permutations achieve f = 1 = M*; pins just filter.

What Platinum December 2024 was actually testing

P1 — SOS / subset-sum DP plus inclusion-exclusion. 2^K universe with K ≤ 20.
P2 — Aliens / Lagrangian trick on a convex DP. Recover all k from a single sweep.
P3 — Structural characterisation of optima, then DP. Identify the family before counting.