USACO 2024 December · Platinum · all three problems
Problem 1 — All Pairs Similarity
Statement
Each of N cows owns a K-bit string (interpret as a subset of a K-element universe). For every cow i, compute the sum over j of the Jaccard similarity J(i, j) = |si ∩ sj| / |si ∪ sj|, then output that sum modulo a prime. K is small (≤ 20).
Constraints
- 1 ≤ N ≤ 5×105; 1 ≤ K ≤ 20; bitstrings are non-zero.
- Memory: 512 MB (twice default).
- Time limit: 2s (standard Platinum).
Subtask structure
- Two test cases per K ∈ {10, 15, 16, 17, 18, 19, 20} (inputs 2–15).
- Partial scoring rewards solutions that handle small K but degrade on K=20.
Key idea
There are at most 2K distinct masks. Group cows by mask: c[m] = number of cows with mask m. For each cow with mask m, the answer is Σm' c[m'] · popcount(m ∧ m') / popcount(m ∨ m'). Naively this is 4K per query. Trick: for each unordered pair (a = popcount(m ∩ m'), b = popcount(m ∪ m')), the contribution factor a / b is a fixed rational — so split by (a, b). Use subset-sum (SOS) DP to compute, for each mask m and each target intersection size a, Σm' ⊇ a-bit subset of m c[m']; do the analogous super-set DP for union size. Combine with modular inverses of b. Total: O(K² · 2K) which at K=20 is ~4·108 — tight but possible inside 2s with bit tricks; the 512 MB memory budget allows two K-indexed arrays of size 2K.
Complexity target
O(K² · 2K) time, O(K · 2K) memory.
Reference solution (C++, sketch)
#include <bits/stdc++.h>
using namespace std;
using ll = long long;
const ll MOD = 1'000'000'007;
ll modpow(ll a, ll e, ll m) {
ll r = 1 % m; a %= m;
while (e) { if (e & 1) r = r * a % m; a = a * a % m; e >>= 1; }
return r;
}
int main() {
ios_base::sync_with_stdio(false);
cin.tie(nullptr);
int N, K; cin >> N >> K;
int M = 1 << K;
vector<int> cnt(M, 0);
vector<int> masks(N);
for (int i = 0; i < N; ++i) { cin >> masks[i]; cnt[masks[i]]++; }
// SOS subset-sum DP: subCnt[m] = sum of cnt[m'] over m' subset of m
vector<ll> subCnt(cnt.begin(), cnt.end());
for (int bit = 0; bit < K; ++bit)
for (int m = 0; m < M; ++m)
if (m & (1 << bit)) subCnt[m] += subCnt[m ^ (1 << bit)];
// Super-set DP: supCnt[m] = sum of cnt[m'] over m' superset of m
vector<ll> supCnt(cnt.begin(), cnt.end());
for (int bit = 0; bit < K; ++bit)
for (int m = 0; m < M; ++m)
if (!(m & (1 << bit))) supCnt[m] += supCnt[m | (1 << bit)];
// Precompute modular inverses of 1..K
vector<ll> inv(K + 1, 1);
for (int i = 2; i <= K; ++i) inv[i] = modpow(i, MOD - 2, MOD);
// For each cow i with mask m, accumulate answer.
// ans(m) = sum over j of popcount(m & m_j) / popcount(m | m_j) (mod p).
// Decompose by enumerating intersection mask t subset of m and union
// mask u superset of m. Number of cows with mask exactly r such that
// t = m & r and u = m | r is determined by r = t | (u \ m).
// Implementation outline (full version requires careful inclusion-exclusion):
for (int i = 0; i < N; ++i) {
int m = masks[i];
ll s = 0;
// Iterate over possible intersection sizes a and union sizes b.
// For each (a, b), the number of j's contributing equals
// (# cows with mask r where |r & m| = a, |r | m| = b),
// which is computed via inclusion-exclusion on subCnt/supCnt.
// [verify] — exact derivation depends on the editorial; this sketch
// is the right algorithmic skeleton, not a finished submission.
cout << s << "\n";
}
}
[verify] This is a deliberate sketch — the inclusion-exclusion step over (a, b) is the heart of the problem and the official editorial at usaco.org gives the exact recurrence. Treat the SOS / super-set scaffolding as confirmed; the inner accounting needs the editorial.
Pitfalls
- Memory. Two 220 longs = 16 MB — fits in 512 MB easily, but careless K-indexed tables of size K · 2K can blow the budget at K=20.
- Modular inverse of denominators. The denominator b can be 1..K; precompute inverses once.
- SOS direction. Subset DP iterates bits outer, masks inner; reversing the loops is wrong.
Problem 2 — It's Mooin' Time (Platinum)
Statement
A string of M's and O's of length N is given, with a flip cost ci per position. A "moo of length L" is an M followed by L−1 O's. For each k from 1 to ⌊N / L⌋, compute the minimum total flip cost needed so the resulting string contains at least k non-overlapping moo occurrences.
Constraints
- N ≤ 3×105; L ∈ {1, 2, 3}; 1 ≤ ci ≤ 108.
- Time limit: 3s (1.5× default).
Subtask structure
- Input 5: L=3, N ≤ 5000 (brute DP allowed).
- Input 6: L=1 (trivial — every M counts as a moo, sort costs).
- Inputs 7–10: L=2.
- Inputs 11–18: L=3, full constraints.
Key idea
Fix L. Define f(k) = minimum cost for ≥ k moos. The function k ↦ f(k) is convex. Standard tool: Aliens / Lagrangian binary search — binary search λ ≥ 0, solve the "place as many non-overlapping moos as you want, paying their cost minus a bonus of λ per moo" DP in O(N), and extract k(λ) and value(λ). The unconstrained DP is a simple 1D recurrence with state "last position covered." Recover f(k) for all k by sweeping λ and using the standard slope trick.
Complexity target
O(N log V) where V is the value range; subtask inputs allow O(N · k) brute-force.
Reference solution (C++, L=3 framework)
#include <bits/stdc++.h>
using namespace std;
using ll = long long;
int N, L;
string s;
vector<ll> c;
// cost(i) = cost to make a moo of length L starting at i.
ll mooCost(int i) {
ll x = 0;
if (s[i] != 'M') x += c[i];
for (int k = 1; k < L; ++k) if (s[i+k] != 'O') x += c[i+k];
return x;
}
// Given a Lagrange penalty 'lambda' subtracted per placed moo, find the
// minimum value of (totalFlipCost - lambda * numMoos) using O(N) DP.
// Returns (bestValue, numMoosUsedAtThatValue).
pair<ll,int> solve(ll lambda) {
// dp[i] = (best value over prefix of length i, moos used to achieve it)
vector<pair<ll,int>> dp(N + 1);
dp[0] = {0, 0};
for (int i = 1; i <= N; ++i) {
dp[i] = dp[i - 1]; // skip position i-1
if (i >= L) {
ll cand = dp[i - L].first + mooCost(i - L) - lambda;
int mcnt = dp[i - L].second + 1;
if (cand < dp[i].first ||
(cand == dp[i].first && mcnt > dp[i].second))
dp[i] = {cand, mcnt};
}
}
return dp[N];
}
int main() {
ios_base::sync_with_stdio(false);
cin.tie(nullptr);
cin >> N >> L >> s;
c.assign(N, 0);
for (auto& v : c) cin >> v;
int maxK = N / L;
vector<ll> f(maxK + 1, 0);
// Binary search the smallest lambda such that solve(lambda).second < k.
// For each k, recover f(k) = bestValue + lambda * k.
for (int k = 1; k <= maxK; ++k) {
ll lo = 0, hi = (ll)1e15;
while (lo < hi) {
ll mid = lo + (hi - lo) / 2;
auto [val, used] = solve(mid);
if (used >= k) lo = mid + 1; else hi = mid;
}
// At lambda = lo - 1 (or lo), recover f(k).
auto [val, used] = solve(lo);
f[k] = val + lo * k;
cout << f[k] << (k == maxK ? '\n' : ' ');
}
}
[verify] The Aliens-trick details (whether to break ties on used moos high or low, whether λ ranges over integers or reals) need the editorial; the per-k binary search above is O(N log V · k), which is O(N² log V) total and too slow for N=3·105. The intended solution computes f(·) for all k in a single O(N log V) sweep using the convex-hull / slope-trick recovery. The DP body and Lagrange framing are correct.
Pitfalls
- Non-overlap. The DP must transition from i − L, not from i − 1, when placing.
- Tie-breaking in the Lagrangian DP. Always pick the transition that uses more moos at equal value — this is what makes the binary search converge to f(k).
- L = 1 is degenerate. Every M is a moo; answer is the smallest k flip-costs among non-M positions.
- Output format. "Output ⌊N/L⌋ values" — emit all of them, not just f(N/L).
Problem 3 — Maximize Minimum Difference
Statement
Among permutations of 1..N, let f(π) = mini |πi+1 − πi|. Let M* be the maximum of f over all permutations. K positional pins are given of the form "position p must hold value v." Count the number of permutations that simultaneously achieve f = M* and obey all pins, modulo 109+7.
Constraints
- 2 ≤ N ≤ 2000; 0 ≤ K ≤ N; sum across tests T·N ≤ 2×104.
- Time limit: 4s (2× default).
Subtask structure
- Tier on small N (≤ 15) → brute permutation enumeration.
- Tier with no pins or with pins only at endpoints.
- Full N ≤ 2000.
Key idea
Step 1: M* = ⌈(N − 1) / 2⌉ — the optimum is achieved by interleaving the small half and the large half (e.g. for N = 6, permutations like 1, 4, 2, 5, 3, 6 hit gap = 3). Step 2: the set of optimal permutations is a controlled "weaving" family parameterised by where the split point sits and a few binary choices at the ends. Step 3: counting with pins reduces to a DP over positions 1..N where the state tracks (a) which half (lower or upper) the next value comes from and (b) progress through the lower / upper run. Pins act as forced state transitions; reject if any pin is inconsistent with any optimum.
Complexity target
O(N²) per test (states × transitions), fine at N = 2000.
Reference solution (C++, framework)
#include <bits/stdc++.h>
using namespace std;
using ll = long long;
const ll MOD = 1'000'000'007;
int N, K;
vector<int> pinPos, pinVal;
// pinAt[p] = v if position p is pinned to v, else 0
vector<int> pinAt;
int main() {
ios_base::sync_with_stdio(false);
cin.tie(nullptr);
int T; cin >> T;
while (T--) {
cin >> N >> K;
pinAt.assign(N + 1, 0);
bool feasible = true;
for (int i = 0; i < K; ++i) {
int p, v; cin >> p >> v;
if (pinAt[p] && pinAt[p] != v) feasible = false;
pinAt[p] = v;
}
if (!feasible) { cout << 0 << "\n"; continue; }
int Mstar = (N - 1 + 1) / 2; // = ceil((N-1)/2)
int lowHalf = (N + 1) / 2; // values 1..lowHalf are the "low" group
// dp[i][lo][hi] = ways to fill positions 1..i where 'lo' low values
// and 'hi' high values have been placed, last placed = low/high
// implicit from (lo + hi - 1) parity in optimal weave.
// For N=2000 we need O(N^2) states; collapse the parity to a single bit.
// dp[lo][hi][lastSide] is O(N^2 * 2) ~ 1.6e7.
vector<vector<array<ll,2>>> dp(
lowHalf + 1, vector<array<ll,2>>(N - lowHalf + 1, {0, 0}));
// Start with first position empty; the two starting cases are
// "first is low" or "first is high".
auto place = [&](int pos, int lo, int hi, int side, ll ways) {
int val = (side == 0) ? lo : (lowHalf + hi);
if (pinAt[pos] && pinAt[pos] != val) return ll(0);
return ways;
};
// initial state: position 1 placed.
dp[1][0][0] = place(1, 1, 0, 0, 1);
dp[0][1][1] = place(1, 0, 1, 1, 1);
for (int pos = 2; pos <= N; ++pos) {
vector<vector<array<ll,2>>> nx(
lowHalf + 1, vector<array<ll,2>>(N - lowHalf + 1, {0, 0}));
for (int lo = 0; lo <= lowHalf; ++lo)
for (int hi = 0; hi <= N - lowHalf; ++hi)
for (int last = 0; last < 2; ++last)
if (dp[lo][hi][last]) {
// Place the next value: must come from the OTHER
// side to maintain the gap = ceil((N-1)/2) weave.
int side = 1 - last;
int nlo = lo + (side == 0 ? 1 : 0);
int nhi = hi + (side == 1 ? 1 : 0);
if (nlo > lowHalf || nhi > N - lowHalf) continue;
ll w = place(pos, nlo, nhi, side, dp[lo][hi][last]);
nx[nlo][nhi][side] = (nx[nlo][nhi][side] + w) % MOD;
}
dp = move(nx);
}
ll ans = (dp[lowHalf][N - lowHalf][0] + dp[lowHalf][N - lowHalf][1]) % MOD;
cout << ans << "\n";
}
}
[verify] The "strict alternation between low and high halves" claim characterises one family of optima but the actual optimal family includes additional reflections, especially when N is even vs odd. Use this as a working skeleton; cross-check the editorial for the exact structural theorem (M* and the count of optimal permutations without pins).
Pitfalls
- Identifying M*. Off-by-one in ⌈(N−1)/2⌉ — verify on N = 2, 3, 4 by hand.
- Pins outside the optimal family. If any pin violates the alternation rule, the answer is 0; check eagerly.
- Memory. N² · 2 longs at N = 2000 is 32 MB — comfortable; don't redundantly allocate.
- Edge case N = 2. Both permutations achieve f = 1 = M*; pins just filter.
What Platinum December 2024 was actually testing
- P1 — SOS / subset-sum DP plus inclusion-exclusion. 2K universe with K ≤ 20.
- P2 — Aliens / Lagrangian trick on a convex DP. Recover all k from a single sweep.
- P3 — Structural characterisation of optima, then DP. Identify the family before counting.