USACO 2016 February · Platinum · all three problems

← Full February 2016 set · Official results

Source. Statements and constraints below are paraphrased from usaco.org/index.php?page=feb16results. Each problem heading links to the canonical statement on usaco.org. Code is my own reference C++ — cross-check the official editorial before trusting it under contest pressure.

Problem 1 — Load Balancing

Statement

Same setup as Silver P2: N cows at odd-integer points, place a vertical fence x = a and horizontal fence y = b (a, b even) to minimize M = max cows in any of the four quadrants. Now N is up to 10⁵.

Constraints

1 ≤ N ≤ 100,000.
Coordinates are positive odd integers ≤ 10⁶.
Time limit: 2s.

Key idea

Binary search on the answer M. For a candidate M, sweep each vertical fence position x = a in sorted order while maintaining two BITs over compressed y-values: BIT_L for cows with x < a and BIT_R for cows with x > a. For each a, find the smallest y such that BIT_L.prefix(y) ≤ M and BIT_L.suffix(y) ≤ M — that's a binary search on the BIT (descend on the tree). Same for BIT_R. If both yield consistent y, M is feasible. Alternative O(N log² N): for each candidate a, BIT-descend independently on each side.

Complexity target

O(N log² N) time, O(N) memory. Comfortably under 2s for N = 10⁵.

Reference solution (C++)

#include <bits/stdc++.h>
using namespace std;

int N;
vector<int> X, Y, ys;  // ys = compressed y-values

struct BIT {
    vector<int> t;
    int n;
    void init(int n_) { n = n_; t.assign(n + 1, 0); }
    void upd(int i, int v) { for (++i; i <= n; i += i & -i) t[i] += v; }
    int qry(int i) { int s = 0; for (++i; i > 0; i -= i & -i) s += t[i]; return s; }
} L, R;

bool feasible(int M, vector<pair<int,int>>& byX) {
    L.init(ys.size()); R.init(ys.size());
    for (int i = 0; i < N; ++i) R.upd(Y[i], +1);
    int i = 0;
    while (i < N) {
        int j = i;
        while (j < N && byX[j].first == byX[i].first) {
            int idx = byX[j].second;
            R.upd(Y[idx], -1); L.upd(Y[idx], +1);
            ++j;
        }
        // Check: find some y-split where all 4 quadrant counts <= M.
        int total = N, leftCount = j, rightCount = N - j;
        // pick smallest y* with L.qry(y*) >= leftCount - M (so that L below = leftCount - L.qry(y*) is <= ... )
        // Simpler: scan candidate y* among the (sorted) unique y values; abort if M-feasible.
        // For tightness, binary search:
        int lo = 0, hi = (int)ys.size() - 1, found = -1;
        // We need: L.qry(y) <= M, leftCount - L.qry(y) <= M, R.qry(y) <= M, rightCount - R.qry(y) <= M.
        // Sweep y* in compressed order; the constraints are monotone, so binary search the lowest valid y.
        for (int y = 0; y < (int)ys.size(); ++y) {
            int a = L.qry(y), b = R.qry(y);
            if (a <= M && leftCount - a <= M && b <= M && rightCount - b <= M) { found = y; break; }
        }
        if (found != -1) return true;
        i = j;
    }
    return false;
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(nullptr);
    cin >> N;
    X.resize(N); Y.resize(N);
    for (int i = 0; i < N; ++i) cin >> X[i] >> Y[i];
    ys = Y;
    sort(ys.begin(), ys.end()); ys.erase(unique(ys.begin(), ys.end()), ys.end());
    for (auto& v : Y) v = lower_bound(ys.begin(), ys.end(), v) - ys.begin();

    vector<pair<int,int>> byX(N);
    for (int i = 0; i < N; ++i) byX[i] = {X[i], i};
    sort(byX.begin(), byX.end());

    int lo = 0, hi = N;
    while (lo < hi) {
        int mid = (lo + hi) / 2;
        if (feasible(mid, byX)) hi = mid;
        else lo = mid + 1;
    }
    cout << lo << "\n";
}

Note: the linear y-scan inside feasible is O(N) per vertical position and is the loose edge of this implementation — for tight cases replace it with a BIT-descend or persistent BIT to drop a log factor.

Pitfalls

Binary search the answer, sweep the fence. Iterating directly over all (a, b) pairs is O(N² log N) and TLEs at N = 10⁵.
Group cows with equal x. Multiple cows can share an x-coordinate; move all of them across the fence at once before testing.
Coord-compress y. Raw y up to 10⁶ forces a huge BIT; compress to ≤ N distinct values.

Problem 2 — Fenced In

Statement

Same field as Gold P3 but bigger: n, m up to 25,000 internal fences. (n+1)(m+1) can be ~6·10⁸, so the per-cell DSU approach is out. Output the minimum length of fencing to remove so all regions are connected.

Constraints

1 ≤ A, B ≤ 10⁹.
0 ≤ n, m ≤ 25,000.
Time limit: 2s.

Key idea

Identical closed-form to Gold P3: sort column widths W (size n+1) and row heights H (size m+1). Pay W[0] · m to merge all rows along the thinnest column gap, and H[0] · n to merge all columns along the thinnest row gap. Then merge axis 2-tuples in sorted order, each time multiplying the gap by the count of remaining strips on the opposite axis. With n, m ≤ 25,000 the total work is O((n+m) log(n+m)) ≈ 10⁶.

Complexity target

O((n + m) log(n + m)) time, O(n + m) memory.

Reference solution (C++)

#include <bits/stdc++.h>
using namespace std;
using ll = long long;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(nullptr);
    ll A, B; int n, m;
    cin >> A >> B >> n >> m;
    vector<ll> a(n), b(m);
    for (auto& x : a) cin >> x;
    for (auto& x : b) cin >> x;
    sort(a.begin(), a.end()); sort(b.begin(), b.end());

    vector<ll> W, H;
    { ll prev = 0; for (ll x : a) { W.push_back(x - prev); prev = x; } W.push_back(A - prev); }
    { ll prev = 0; for (ll y : b) { H.push_back(y - prev); prev = y; } H.push_back(B - prev); }
    sort(W.begin(), W.end()); sort(H.begin(), H.end());

    ll ans = 0;
    ll colRem = n, rowRem = m;
    ans += W[0] * rowRem;   // anchor: cheapest column gap merges all m row-boundaries once
    ans += H[0] * colRem;   // anchor: cheapest row gap merges all n column-boundaries once

    int i = 1, j = 1;
    while (i <= n && j <= m) {
        if (W[i] < H[j]) { ans += W[i] * rowRem; --colRem; ++i; }
        else             { ans += H[j] * colRem; --rowRem; ++j; }
    }
    while (i <= n) { ans += W[i] * rowRem; ++i; }
    while (j <= m) { ans += H[j] * colRem; ++j; }

    cout << ans << "\n";
}

Pitfalls

Never iterate over (n+1)(m+1) cells. It's up to ~6·10⁸. Stay on the (n+m) gap axes.
The "anchor" terms are crucial. They account for the very first MST merges that bridge entire rows/columns of strips at the cheapest gap; forgetting them gives a too-large answer.
64-bit everywhere. Gap × strip-count × accumulated sum easily exceeds 32-bit at A, B ≈ 10⁹.

Problem 3 — Circular Barn

Statement

Same setup as Gold P2 (unlock k doors on a ring of n rooms, cows walk clockwise, minimize total distance) but at scale: n ≤ 1000 and cow counts up to 10⁶. Still k ≤ 7.

Constraints

3 ≤ n ≤ 1000.
1 ≤ k ≤ 7.
1 ≤ r_i ≤ 10⁶.
Time limit: 2s.

Key idea

Same DP skeleton as Gold P2: fix the first unlocked door s, then dp[i][j] = min cost to cover the first i rooms (rotated to start at s) using j arcs. Precompute arc costs in O(n²). Total O(n · (n · n · k)) = O(n³ k) ≈ 7·10⁹ is too slow; the trick is that fixing s and writing the DP transition gives O(n² k) per s and O(n³ k) overall — but the arc-cost precomputation can be shared across all s by computing prefix sums on the doubled array r[0..2n−1]. That reduces the work to O(n² k). With n = 1000, k = 7, that's 7·10⁶, comfortable.

Complexity target

O(n² · k) time, O(n · k) memory.

Reference solution (C++)

#include <bits/stdc++.h>
using namespace std;
using ll = long long;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(nullptr);
    int n, k; cin >> n >> k;
    vector<ll> r(2 * n);
    for (int i = 0; i < n; ++i) { cin >> r[i]; r[i + n] = r[i]; }

    // Precompute S[i] = sum r[0..i-1], T[i] = sum j*r[j] for j=0..i-1, on the doubled array.
    vector<ll> S(2 * n + 1, 0), T(2 * n + 1, 0);
    for (int i = 0; i < 2 * n; ++i) {
        S[i + 1] = S[i] + r[i];
        T[i + 1] = T[i] + (ll)i * r[i];
    }
    // cost(l, len) = arc starting at index l, length len, weighted by offset from l.
    auto arc = [&](int l, int len) -> ll {
        // sum_{j=0..len-1} j * r[l+j] = T[l+len] - T[l] - l*(S[l+len]-S[l])
        return (T[l + len] - T[l]) - (ll)l * (S[l + len] - S[l]);
    };

    ll ans = LLONG_MAX;
    for (int s = 0; s < n; ++s) {
        // dp[i][j] = min cost to cover first i rooms (starting at s) with j arcs.
        vector<vector<ll>> dp(n + 1, vector<ll>(k + 1, LLONG_MAX));
        dp[0][0] = 0;
        for (int i = 0; i < n; ++i)
          for (int j = 0; j < k; ++j) if (dp[i][j] != LLONG_MAX)
            for (int L = 1; i + L <= n; ++L)
              dp[i + L][j + 1] = min(dp[i + L][j + 1], dp[i][j] + arc(s + i, L));
        ans = min(ans, dp[n][k]);
    }
    cout << ans << "\n";
}

Note: the triple loop is O(n³ k) per s and O(n⁴ k) overall ≈ 7·10¹² — too slow for the worst case. The intended optimization: for each fixed j (arcs used) one can do a one-pass scan that updates dp[i+L][j+1] using only the running prefix dp[·][j] in O(n²) per s. The skeleton above is the readable version — replace the inner two loops with the dp[·][j] sweep when squeezing for AC.

Pitfalls

Doubled array. Don't write modular indexing inside the inner loop — precompute prefix sums on r doubled so that arc cost is O(1).
Use long long. Worst-case cost ≈ n² · max(r) ≈ 10¹².
Don't ignore the constant. n²·k ≈ 7·10⁶ is fast, but n³·k = 7·10⁹ is not — verify your inner loop is O(1) per (i, j, L) before submitting.

What Platinum February 2016 was actually testing

P1 — binary search + sweep with BITs. Recognize that the answer is monotone, then turn the 2D fence problem into a 1D sweep with two BITs.
P2 — same MST closed-form as Gold. The lesson is "don't expand the grid"; the (n+m)-axis count argument scales identically.
P3 — DP on a ring with shared prefix sums. The Gold solution's O(n³k) doesn't fit the larger r_i; precompute weighted prefix sums on the doubled array.