My notebook for the USA AI Olympiad.

I'm preparing for USAAIO by writing down what I learn — the linear algebra and probability I lean on, the Python data stack, classical ML I want to be fluent in, and the PyTorch and transformer code I expect to need on contest day. Writing it down is how I check that I actually understand it.

Open the study plan → What is USAAIO?
3 stagesRound 1 · Round 2 · USAAIO Camp
Under 20middle/high school in US or Canada (citizen/PR/FT-student)
PythonNumPy · scikit-learn · PyTorch
IOAITeam USA selection path
New — past-problem archive. 4 real past problems (IOAI 2024 NLP, IOAI 2025 CV, NOAI China 2024 tabular ML, USAAIO 2025 Round 1 theory) with full walkthroughs: official statement, baseline code, the techniques that historically lifted scores, and a follow-up drill. Start here after the stratum pages.

What this site covers

Each module is a self-contained page. The study plan stitches them into a weekly progression from math foundations through transformer fine-tuning.

Orient

About the contest

Format, eligibility, the three-stage structure (Round 1 → Round 2 → USAAIO Camp), Team USA selection for IOAI / IAIO.

Read briefing →
Foundations

Math you need

Linear algebra, probability and statistics, multivariable calculus, convex optimization — only the parts that actually show up in ML.

Review math →
Tooling

Python data stack

NumPy, pandas, matplotlib, seaborn, scikit-learn — environment setup, idioms, common pitfalls.

Open toolkit →
Learn

Classical ML

Regression, classification, ensembles, cross-validation, clustering, dimensionality reduction — the scikit-learn surface.

Browse models →
Deep Learning

PyTorch & neural nets

Tensors, autograd, MLPs, the standard layers, forward/backpropagation, training loops, regularization.

Open notebook →
Modern AI

Attention & transformers

Tokenization, embeddings, self-attention, transformer blocks, pre-training and fine-tuning, NLP and vision applications.

See architecture →
Drill

Round 2 theory drills

Thirty short-answer problems with full collapsible solutions — softmax Jacobians, Hoeffding bounds, receptive fields, KV-cache math, the DDPM forward process.

Open drill bank →
Vision

U-Net & segmentation

Encoder-decoder CNN with skip connections — biomedical and semantic segmentation, and the denoiser inside every diffusion model.

Open U-Net →
Generative

Variational autoencoders

Probabilistic autoencoder with a Gaussian latent. ELBO, reparameterisation, KL divergence — and the latent compressor inside Stable Diffusion.

Open VAE →
Generative

Diffusion models (DDPM)

Forward noising, reverse denoising, the simplified epsilon-prediction loss, and how Stable Diffusion extends DDPM with latent space + cross-attention.

Open DDPM →
Graphs

Graph neural networks

Message passing, GCN normalised adjacency, GAT attention, readout pooling — molecule property prediction and citation classification.

Open GNN →
Sequential decisions

Reinforcement learning

MDPs, Bellman, Q-learning, REINFORCE, actor-critic, PPO clipped objective. Tabular gridworld + DQN sketch + CartPole policy gradient in PyTorch.

Open RL →
Transformers

Attention variants

MQA, GQA, FlashAttention, linear & sparse attention, sliding-window, RoPE — cost-cutting toolbox for long-context and fast-inference transformers.

Open variants →
Adapt

Fine-tuning · LoRA & QLoRA

Adapt pretrained models on a Colab budget — linear probing, adapters, LoRA, QLoRA, SFT, and DPO without a reward model.

Open fine-tuning →
Practical

Data augmentation

Image, text, tabular & audio aug — RandAugment, MixUp/CutMix, EDA, back-translation, SMOTE, SpecAugment, and TTA at inference.

Open augmentation →
Validate

Model evaluation

Metrics (F1, AUC, MAE, BLEU, FID), validation strategies (stratified / group / rolling-origin / nested CV), bias-variance, leaderboard tactics — probing, ensembling, calibration, threshold tuning.

Open evaluation →
Explain

Interpretability

SHAP, LIME, Grad-CAM, Integrated Gradients, attention rollout. The Round-2 toolkit for explaining what your model learned.

Open interpretability →
MLOps

MLOps & submission packaging

Seeds, env pinning, atomic checkpoints, inference scripts, submission-CSV gotchas (BOM, line endings, sort order), Dockerfile basics, 10-min pre-submit checklist.

Ship clean submissions →
Notebooks

End-to-end Colab notebooks

Four runnable pipelines: Titanic tabular ML, CIFAR-10 PyTorch CNN, DistilBERT IMDB fine-tune, Bayesian A/B test.

Open notebooks →
Mocks

Mock contests

Two full USAAIO-style mocks: a 90-min tabular ML problem (Round-1 style) and a 180-min small-vision problem (Round-2 style). Each ships with a constraints panel, scoring rubric, and a separate reference-solution page.

Sit a mock →
Archive

Past contest walkthroughs

Four real IOAI / USAAIO past problems (NLP, CV, tabular, theory) with baseline code, improvement playbooks, and source links.

Open archive →
Set · USAAIO 2026

USAAIO 2026 Round 1

Just-released 9-problem set. Forum link-only for now; walkthroughs to be written as Barry sits each problem.

Open forum index →
Set · USAAIO 2025

USAAIO 2025 Round 2

3 problems: 12-part theory, 14-part pipeline, and the 20-part CLIP / flickr30k multimodal mega-task.

Open forum index →
Set · IOAI 2025

IOAI 2025 Individual Contest

Beijing 2025 individual tasks: chicken counting (CV), radar (ML), concepts (NLP). 3 of 6 featured.

Open set →
Set · USAAIO 2025

USAAIO 2025 Round 1

Online qualifier, 5 problems: recurrence, affine NN, EDA/ML pipeline, CNN, transformer attention.

Open set →
Set · IAIO 2024

IAIO 2024 (Russia-led)

6 questions, 22 sub-question forum threads. Distinct from UNESCO-backed IOAI; useful tabular/CV reps.

Open forum index →
Set · IOAI 2024

IOAI 2024 Scientific Round

At-home + on-site: ML matrix feature gen, NLP ciphered language, CV zebra/giraffe weight swap, on-site cow+hydrant. 4 task walkthroughs.

Open set →
Set · IOAI 2024

IOAI 2024 Practical Round

4-hour team challenge: album cover + music video from one song. Brief, cover, video walkthroughs.

Open set →
Set · USAAIO 2024

USAAIO 2024 Round 1 [reconstructed]

2024 PDF not yet archived; syllabus-driven reconstruction: probability, regression, trees, MLP, embeddings.

Open set →
Survival

Engineering survival

Fourteen common ML/DL pitfalls (data leakage, train/inference mismatches, seeding) plus a Colab/Kaggle playbook — runtimes, checkpoints, profiling, submission template.

Avoid the bugs →
Reference

Contest cheatsheets

One dense page: NumPy/pandas/matplotlib/sklearn idioms, PyTorch shape ops and training loop boilerplate, layer shape rules, Hugging Face Trainer, and a complexity table.

Open cheatsheets →
Plan

Six-month plan

Week-by-week schedule from math review through deep learning fluency, calibrated for a Grade 9 ramp.

Open calendar →
Library

Resources

Textbooks, courses, datasets, paper recommendations, and competitive practice grounds (Kaggle, AIcrowd).

Browse links →

Suggested reading order

  1. Get oriented. Read About the contest to understand the three-stage format (Round 1 → Round 2 → USAAIO Camp) and how Team USA is actually picked.
  2. Lock in the math. Work through the math review. You don't need all of multivariable calculus — just the slice that powers gradient descent and PCA.
  3. Get fluent in the Python data stack. The Python toolkit covers NumPy + pandas + matplotlib until you can manipulate arrays without thinking.
  4. Sweep classical ML. The classical ML page covers every scikit-learn family in one sitting: linear models, trees, ensembles, clustering.
  5. Build a neural net from scratch. The deep learning page walks through a manual MLP in NumPy, then the same thing in PyTorch.
  6. Understand attention. The transformers page goes from tokenizer to scaled dot-product attention to a working transformer block.

Why an AI olympiad is different

Big-picture reminders

Three stages. Round 1 (online qualifier, ~300+ participants) → Round 2 (in-person, threshold-based, ~19% advance — 76 finalists in 2025) → USAAIO Camp (June, at MIT in 2026; Harvard hosted in prior years). All answers submit through Google Colab. Round 1 is CPU-only; Round 2 may use L4 GPUs.
Language. Python is the de facto language. The required libraries — NumPy, pandas, matplotlib, scikit-learn, PyTorch — are all standard, but you must be comfortable enough to debug them under contest pressure.
The international path. Top scorers in Round 2 are invited to the USAAIO Camp (held at MIT in June). Camp team-selection tests — not Round 2 itself — pick Team USA for IOAI and IAIO. The camp is the real gate.