USACO 2016 February · Platinum · all three problems
Problem 1 — Load Balancing
Statement
Same setup as Silver P2: N cows at odd-integer points, place a vertical fence x = a and horizontal fence y = b (a, b even) to minimize M = max cows in any of the four quadrants. Now N is up to 105.
Constraints
- 1 ≤ N ≤ 100,000.
- Coordinates are positive odd integers ≤ 106.
- Time limit: 2s.
Key idea
Binary search on the answer M. For a candidate M, sweep each vertical fence position x = a in sorted order while maintaining two BITs over compressed y-values: BITL for cows with x < a and BITR for cows with x > a. For each a, find the smallest y such that BITL.prefix(y) ≤ M and BITL.suffix(y) ≤ M — that's a binary search on the BIT (descend on the tree). Same for BITR. If both yield consistent y, M is feasible. Alternative O(N log² N): for each candidate a, BIT-descend independently on each side.
Complexity target
O(N log² N) time, O(N) memory. Comfortably under 2s for N = 105.
Reference solution (C++)
#include <bits/stdc++.h>
using namespace std;
int N;
vector<int> X, Y, ys; // ys = compressed y-values
struct BIT {
vector<int> t;
int n;
void init(int n_) { n = n_; t.assign(n + 1, 0); }
void upd(int i, int v) { for (++i; i <= n; i += i & -i) t[i] += v; }
int qry(int i) { int s = 0; for (++i; i > 0; i -= i & -i) s += t[i]; return s; }
} L, R;
bool feasible(int M, vector<pair<int,int>>& byX) {
L.init(ys.size()); R.init(ys.size());
for (int i = 0; i < N; ++i) R.upd(Y[i], +1);
int i = 0;
while (i < N) {
int j = i;
while (j < N && byX[j].first == byX[i].first) {
int idx = byX[j].second;
R.upd(Y[idx], -1); L.upd(Y[idx], +1);
++j;
}
// Check: find some y-split where all 4 quadrant counts <= M.
int total = N, leftCount = j, rightCount = N - j;
// pick smallest y* with L.qry(y*) >= leftCount - M (so that L below = leftCount - L.qry(y*) is <= ... )
// Simpler: scan candidate y* among the (sorted) unique y values; abort if M-feasible.
// For tightness, binary search:
int lo = 0, hi = (int)ys.size() - 1, found = -1;
// We need: L.qry(y) <= M, leftCount - L.qry(y) <= M, R.qry(y) <= M, rightCount - R.qry(y) <= M.
// Sweep y* in compressed order; the constraints are monotone, so binary search the lowest valid y.
for (int y = 0; y < (int)ys.size(); ++y) {
int a = L.qry(y), b = R.qry(y);
if (a <= M && leftCount - a <= M && b <= M && rightCount - b <= M) { found = y; break; }
}
if (found != -1) return true;
i = j;
}
return false;
}
int main() {
ios_base::sync_with_stdio(false);
cin.tie(nullptr);
cin >> N;
X.resize(N); Y.resize(N);
for (int i = 0; i < N; ++i) cin >> X[i] >> Y[i];
ys = Y;
sort(ys.begin(), ys.end()); ys.erase(unique(ys.begin(), ys.end()), ys.end());
for (auto& v : Y) v = lower_bound(ys.begin(), ys.end(), v) - ys.begin();
vector<pair<int,int>> byX(N);
for (int i = 0; i < N; ++i) byX[i] = {X[i], i};
sort(byX.begin(), byX.end());
int lo = 0, hi = N;
while (lo < hi) {
int mid = (lo + hi) / 2;
if (feasible(mid, byX)) hi = mid;
else lo = mid + 1;
}
cout << lo << "\n";
}
Note: the linear y-scan inside feasible is O(N) per vertical position and is the loose
edge of this implementation — for tight cases replace it with a BIT-descend or persistent BIT to
drop a log factor.
Pitfalls
- Binary search the answer, sweep the fence. Iterating directly over all (a, b) pairs is O(N² log N) and TLEs at N = 105.
- Group cows with equal x. Multiple cows can share an x-coordinate; move all of them across the fence at once before testing.
- Coord-compress y. Raw y up to 106 forces a huge BIT; compress to ≤ N distinct values.
Problem 2 — Fenced In
Statement
Same field as Gold P3 but bigger: n, m up to 25,000 internal fences. (n+1)(m+1) can be ~6·108, so the per-cell DSU approach is out. Output the minimum length of fencing to remove so all regions are connected.
Constraints
- 1 ≤ A, B ≤ 109.
- 0 ≤ n, m ≤ 25,000.
- Time limit: 2s.
Key idea
Identical closed-form to Gold P3: sort column widths W (size n+1) and row heights H (size m+1). Pay W[0] · m to merge all rows along the thinnest column gap, and H[0] · n to merge all columns along the thinnest row gap. Then merge axis 2-tuples in sorted order, each time multiplying the gap by the count of remaining strips on the opposite axis. With n, m ≤ 25,000 the total work is O((n+m) log(n+m)) ≈ 106.
Complexity target
O((n + m) log(n + m)) time, O(n + m) memory.
Reference solution (C++)
#include <bits/stdc++.h>
using namespace std;
using ll = long long;
int main() {
ios_base::sync_with_stdio(false);
cin.tie(nullptr);
ll A, B; int n, m;
cin >> A >> B >> n >> m;
vector<ll> a(n), b(m);
for (auto& x : a) cin >> x;
for (auto& x : b) cin >> x;
sort(a.begin(), a.end()); sort(b.begin(), b.end());
vector<ll> W, H;
{ ll prev = 0; for (ll x : a) { W.push_back(x - prev); prev = x; } W.push_back(A - prev); }
{ ll prev = 0; for (ll y : b) { H.push_back(y - prev); prev = y; } H.push_back(B - prev); }
sort(W.begin(), W.end()); sort(H.begin(), H.end());
ll ans = 0;
ll colRem = n, rowRem = m;
ans += W[0] * rowRem; // anchor: cheapest column gap merges all m row-boundaries once
ans += H[0] * colRem; // anchor: cheapest row gap merges all n column-boundaries once
int i = 1, j = 1;
while (i <= n && j <= m) {
if (W[i] < H[j]) { ans += W[i] * rowRem; --colRem; ++i; }
else { ans += H[j] * colRem; --rowRem; ++j; }
}
while (i <= n) { ans += W[i] * rowRem; ++i; }
while (j <= m) { ans += H[j] * colRem; ++j; }
cout << ans << "\n";
}
Pitfalls
- Never iterate over (n+1)(m+1) cells. It's up to ~6·108. Stay on the (n+m) gap axes.
- The "anchor" terms are crucial. They account for the very first MST merges that bridge entire rows/columns of strips at the cheapest gap; forgetting them gives a too-large answer.
- 64-bit everywhere. Gap × strip-count × accumulated sum easily exceeds 32-bit at A, B ≈ 109.
Problem 3 — Circular Barn
Statement
Same setup as Gold P2 (unlock k doors on a ring of n rooms, cows walk clockwise, minimize total distance) but at scale: n ≤ 1000 and cow counts up to 106. Still k ≤ 7.
Constraints
- 3 ≤ n ≤ 1000.
- 1 ≤ k ≤ 7.
- 1 ≤ ri ≤ 106.
- Time limit: 2s.
Key idea
Same DP skeleton as Gold P2: fix the first unlocked door s, then dp[i][j] = min cost to cover the first i rooms (rotated to start at s) using j arcs. Precompute arc costs in O(n²). Total O(n · (n · n · k)) = O(n³ k) ≈ 7·109 is too slow; the trick is that fixing s and writing the DP transition gives O(n² k) per s and O(n³ k) overall — but the arc-cost precomputation can be shared across all s by computing prefix sums on the doubled array r[0..2n−1]. That reduces the work to O(n² k). With n = 1000, k = 7, that's 7·106, comfortable.
Complexity target
O(n² · k) time, O(n · k) memory.
Reference solution (C++)
#include <bits/stdc++.h>
using namespace std;
using ll = long long;
int main() {
ios_base::sync_with_stdio(false);
cin.tie(nullptr);
int n, k; cin >> n >> k;
vector<ll> r(2 * n);
for (int i = 0; i < n; ++i) { cin >> r[i]; r[i + n] = r[i]; }
// Precompute S[i] = sum r[0..i-1], T[i] = sum j*r[j] for j=0..i-1, on the doubled array.
vector<ll> S(2 * n + 1, 0), T(2 * n + 1, 0);
for (int i = 0; i < 2 * n; ++i) {
S[i + 1] = S[i] + r[i];
T[i + 1] = T[i] + (ll)i * r[i];
}
// cost(l, len) = arc starting at index l, length len, weighted by offset from l.
auto arc = [&](int l, int len) -> ll {
// sum_{j=0..len-1} j * r[l+j] = T[l+len] - T[l] - l*(S[l+len]-S[l])
return (T[l + len] - T[l]) - (ll)l * (S[l + len] - S[l]);
};
ll ans = LLONG_MAX;
for (int s = 0; s < n; ++s) {
// dp[i][j] = min cost to cover first i rooms (starting at s) with j arcs.
vector<vector<ll>> dp(n + 1, vector<ll>(k + 1, LLONG_MAX));
dp[0][0] = 0;
for (int i = 0; i < n; ++i)
for (int j = 0; j < k; ++j) if (dp[i][j] != LLONG_MAX)
for (int L = 1; i + L <= n; ++L)
dp[i + L][j + 1] = min(dp[i + L][j + 1], dp[i][j] + arc(s + i, L));
ans = min(ans, dp[n][k]);
}
cout << ans << "\n";
}
Note: the triple loop is O(n³ k) per s and O(n⁴ k) overall ≈ 7·1012 — too slow for the worst case. The intended optimization: for each fixed j (arcs used) one can do a one-pass scan that updates dp[i+L][j+1] using only the running prefix dp[·][j] in O(n²) per s. The skeleton above is the readable version — replace the inner two loops with the dp[·][j] sweep when squeezing for AC.
Pitfalls
- Doubled array. Don't write modular indexing inside the inner loop — precompute prefix sums on r doubled so that arc cost is O(1).
- Use long long. Worst-case cost ≈ n² · max(r) ≈ 1012.
- Don't ignore the constant. n²·k ≈ 7·106 is fast, but n³·k = 7·109 is not — verify your inner loop is O(1) per (i, j, L) before submitting.
What Platinum February 2016 was actually testing
- P1 — binary search + sweep with BITs. Recognize that the answer is monotone, then turn the 2D fence problem into a 1D sweep with two BITs.
- P2 — same MST closed-form as Gold. The lesson is "don't expand the grid"; the (n+m)-axis count argument scales identically.
- P3 — DP on a ring with shared prefix sums. The Gold solution's O(n³k) doesn't fit the larger ri; precompute weighted prefix sums on the doubled array.