2024 January Gold — three problems

← 2024 January contest index · Official results page

Gold is stretch territory: binary lifting / sparse tables on grids, counting with combinatorics, and a partition optimisation. I'm reading these for exposure rather than racing the clock.

Stretch division. The C++ below is a reference implementation following the standard editorial idea. I'd expect to need a full clean re-implementation pass after reading the official editorial.

Problem 1 · Walking in Manhattan

Official statement (cpid=1377)

Statement (paraphrased)

A Manhattan grid has N axis-aligned roads. Q cows each start at a point on the road network at time 0 and walk at 1 unit/second. Direction is deterministic: at every intersection a cow checks total elapsed walk time; if it's even, she turns north (+y), else east (+x). Each cow has its own T seconds elapsed. Output each cow's position after T seconds.

Constraints

1 ≤ N, Q ≤ 2·10⁵; coordinates up to 10⁹; T up to 10⁹
Time/memory: USACO Gold default (2 s / 256 MB)

Key idea

From any intersection the cow's path is forced. Precompute, for each intersection on each direction, a binary-lifting table: jump[k][v] = (intersection reached after 2^k direction-flips, time used). Each cow first walks straight in her current direction until hitting the next intersection (a binary search along the road), then jumps as far as the time budget allows via binary lifting, then walks the residual.

Complexity

O((N + Q) log T) with sorted road endpoints per row/column.

C++ reference

#include <bits/stdc++.h>
using namespace std;
typedef long long ll;
// Reference sketch — full implementation needs careful endpoint bookkeeping.
// [verify] coordinate-compression and road-adjacency model against editorial cpid=1377

struct Node { int x, y; };          // intersection
static const int LOG = 30;
vector<Node> nodes;                 // all intersections
int jumpTo[LOG][1 << 18];           // jumpTo[k][v]
ll  jumpDt[LOG][1 << 18];           // time spent in those 2^k flips

ll cowTime, cowX, cowY;
int cowDir;                         // 0 = N (+y), 1 = E (+x)

// build binary lifting from per-intersection "next intersection given parity"
void build(int V) {
    for (int k = 1; k < LOG; k++)
        for (int v = 0; v < V; v++) {
            int m = jumpTo[k - 1][v];
            jumpTo[k][v] = jumpTo[k - 1][m];
            jumpDt[k][v] = jumpDt[k - 1][v] + jumpDt[k - 1][m];
        }
}

pair<ll, ll> advance(int v, ll t) { // returns final (x, y) reached within time t
    for (int k = LOG - 1; k >= 0; k--)
        if (jumpDt[k][v] <= t) { t -= jumpDt[k][v]; v = jumpTo[k][v]; }
    // walk linear residual in current direction
    if (cowDir == 0) return { nodes[v].x, nodes[v].y + t };
    else             return { nodes[v].x + t, nodes[v].y };
}

int main() {
    // read input, build intersections, fill jumpTo[0]/jumpDt[0], call build(),
    // then for each query locate the starting segment, snap to the next
    // intersection, run advance(). Skeleton only.
    return 0;
}

Pitfalls

Two parity classes (T even / T odd at the intersection) — easy to off-by-one on which direction the cow takes at a given intersection.
A cow may start mid-segment, not at an intersection. Handle that linear pre-walk separately from the binary lifting.
Coordinates up to 10⁹ — definitely long long for times and positions.

Problem 2 · Cowmpetency (Gold version)

Official statement (cpid=1378)

Statement (paraphrased)

Same setup as Silver Cowmpetency, but now N can be huge and you want to count the number of valid score sequences in [1, C]^N consistent with the Q constraints, modulo 10⁹+7.

Constraints

2 ≤ N ≤ 10⁹; 1 ≤ C ≤ 10⁴; 1 ≤ Q ≤ min(N−1, 100)
All a_j distinct
Time/memory: USACO Gold default (2 s / 256 MB)

Key idea

Sort constraints by a. They partition [1, N] into segments where the running max is fixed. Within each segment of length ℓ with current-max m, the number of sequences is the number of length-ℓ sequences whose values lie in [1, m] (standard product). At each constraint boundary a_j→h_j, cow h_j must take a value strictly greater than the running max m and cows strictly between a_j and h_j must stay ≤ m. Sum over the new max value v ∈ [m+1, C].

This gives a polynomial in C of degree ≤ Q, handled with prefix-sum polynomials or a small DP over Q stages with explicit summations over v.

Complexity

O(Q · C) or O(Q² · log MOD) depending on the formulation; with C ≤ 10⁴ and Q ≤ 100 both are comfortable.

C++ reference

#include <bits/stdc++.h>
using namespace std;
typedef long long ll;
const ll MOD = 1'000'000'007LL;

ll pw(ll a, ll b) { ll r = 1; a %= MOD; while (b){ if(b&1) r = r*a%MOD; a = a*a%MOD; b >>= 1; } return r; }

int main() {
    int N, C, Q; scanf("%d %d %d", &N, &C, &Q);
    vector<pair<int,int>> cons(Q);
    for (auto& [a, h] : cons) scanf("%d %d", &a, &h);
    sort(cons.begin(), cons.end());

    // dp[m] = number of ways the running max equals m after processing some prefix of constraints
    // [verify] DP definition against editorial cpid=1378
    vector<ll> dp(C + 2, 0);
    dp[1] = 1;     // a vacuous start — score 1 for cow 1
    int prevA = 1;
    for (auto [a, h] : cons) {
        // cows prevA+1 .. a stay ≤ current max
        int gap1 = a - prevA;
        vector<ll> nx(C + 2, 0);
        for (int m = 1; m <= C; m++) if (dp[m]) {
            ll keep = pw(m, gap1) * dp[m] % MOD;
            // strictly between a and h: h - a - 1 cows also ≤ m
            int gap2 = h - a - 1;
            ll mid = keep * pw(m, gap2) % MOD;
            // cow h takes value v > m
            for (int v = m + 1; v <= C; v++) nx[v] = (nx[v] + mid) % MOD;
        }
        dp = nx;
        prevA = h;
    }
    // remaining tail: cows prevA+1 .. N stay ≤ current max
    ll ans = 0;
    for (int m = 1; m <= C; m++) ans = (ans + dp[m] * pw(m, N - prevA)) % MOD;
    printf("%lld\n", ans);
}

Pitfalls

N is 10⁹ — exponents go through fast power, never a linear loop.
The Σ_v=m+1..C step is O(C) per state; with C = 10⁴ and Q = 100 that's 10⁸·C → too slow naïvely. Replace with a prefix-sum trick over m.
Edge case Q = 0 (no constraints): answer is C^N mod p.

Problem 3 · Nap Sort

Official statement (cpid=1379)

Statement (paraphrased)

Bessie has N integers. She partitions them into a "self" pile and a set of "nap helpers." Self pile: she repeatedly removes the minimum, taking p seconds when the pile has p elements (so total = p+(p-1)+...+1 = p(p+1)/2 for p kept items). Helpers: helper assigned value v naps for v seconds, then inserts v into the sorted output. All this happens concurrently, and the merged output must be sorted. Minimise total wall-clock time.

Constraints

1 ≤ N ≤ 2·10⁵; values up to 10¹¹; up to T = 10 tests
Sum of N over tests ≤ 2·10⁵
Time/memory: USACO Gold default (2 s / 256 MB)

Key idea

Sort values. Bessie's pile must output in sorted order interleaved with the naps; a helper holding value v finishes at time v. So if Bessie's pile has p elements, the j-th smallest element output by Bessie finishes at time j(j+1)/2 (cumulative removals 1+2+...+j). For the merged sequence to be sorted, each "Bessie" output time and each helper's finish time must respect the value ordering.

Binary search the answer T*: given T, Bessie can keep at most p where p(p+1)/2 ≤ T; she takes the p smallest values; remaining values must each have v ≤ T (helpers). Validate that the merged stream is sorted in time order — which reduces to comparing each kept value's "ready time" against neighbouring helpers.

Complexity

O(N log N log V) with binary search on the answer.

C++ reference

#include <bits/stdc++.h>
using namespace std;
typedef long long ll;

int N;
vector<ll> a;

bool feasible(ll T) {
    // find max p with p(p+1)/2 <= T
    ll p = (ll)((sqrt(1.0 + 8.0 * (double)T) - 1) / 2);
    while ((p + 1) * (p + 2) / 2 <= T) p++;
    while (p > 0 && p * (p + 1) / 2 > T) p--;
    if (p > N) p = N;
    // helpers handle the remaining N - p largest; each must satisfy v <= T
    if (N - p > 0 && a[N - 1] > T) return false;
    // [verify] sortedness-of-output check against editorial cpid=1379
    // Bessie's j-th removed value (the j-th smallest, 1-indexed) finishes at j(j+1)/2.
    // For merged sortedness: a[j-1] must be <= every helper value with finish time >= j(j+1)/2.
    // Simplest sufficient check: each Bessie output time <= next helper finish, which holds iff
    // a[j-1] <= a[p] (the smallest helper value) whenever j(j+1)/2 <= a[p].  This is automatic
    // because a is sorted and a[j-1] <= a[p].
    return true;
}

int main() {
    int Tc; scanf("%d", &Tc);
    while (Tc--) {
        scanf("%d", &N);
        a.assign(N, 0);
        for (auto& x : a) scanf("%lld", &x);
        sort(a.begin(), a.end());
        ll lo = 0, hi = (ll)2e11;
        while (lo < hi) {
            ll mid = (lo + hi) / 2;
            if (feasible(mid)) hi = mid; else lo = mid + 1;
        }
        printf("%lld\n", lo);
    }
}

Pitfalls

Values up to 10¹¹ and p(p+1)/2 up to ~2·10¹⁰ — long long throughout, watch for overflow in p(p+1)/2 when p~2·10⁵.
Helper finish-time equals the helper's value, not its index — easy to confuse with self-pile timing.
The sortedness check is the real meat. My feasibility above is the easy sufficient case; the full editorial check requires per-position comparisons between Bessie's removal time and helper values.