End-to-end Colab notebooks

Each stratum page on this site walks through the theory with short, focused snippets. These four notebooks do the opposite — they're complete pipelines from raw data to a working submission, the way USAAIO problems actually arrive. Open one in Colab, hit Run All, and you have a working baseline you can iterate on.

How to run. Click Open in Colab. For notebooks 2 and 3, switch the runtime to GPU (Runtime → Change runtime type → T4 GPU) before running. For notebooks 1 and 4 the free CPU runtime is fine.
Download. The .ipynb files are committed to the repo so you can also download them and run locally (jupyter lab) — no Colab login required.

The four notebooks

Classical ML

01 · Tabular ML on Titanic

Full Kaggle-style pipeline: load → EDA (correlation heatmap, survival bars) → feature engineering (Title, FamilySize, FareBin) → ColumnTransformer preprocessing → LR baseline → 4-model comparison (LogReg, RandomForest, GradientBoosting, Stacking) → 5-fold cross-validation → feature importance → Kaggle submission.csv.

Runtime: ~2 min on Colab CPU. Pairs with Classical ML.

Deep Learning

02 · CIFAR-10 with PyTorch

A small CNN (~200K params) trained end to end: data loaders, model definition, training loop with OneCycleLR, checkpoint save/load, then an augmentation upgrade (RandomCrop + Flip + RandomErasing / Cutout) and a confusion matrix on the test set.

Runtime: ~3–5 min on a T4/L4 GPU. Pairs with Deep Learning.

Transformers

03 · Fine-tune DistilBERT on IMDB

Hugging Face transformers + datasets + evaluate. Load IMDB, tokenize with the DistilBERT tokenizer, fine-tune for 2 epochs with the Trainer API, report accuracy + F1, run inference on custom strings, and save the model to disk.

Runtime: ~8–12 min on a T4/L4 GPU. Pairs with Attention & transformers.

Math · Probability

04 · Bayesian A/B test

Beta-Binomial conjugacy derived in LaTeX, then simulated: stream daily traffic, watch the posteriors tighten, compute P(B>A | data) by Monte Carlo, apply a 95% decision rule, and compare to the frequentist two-proportion z-test. Ends with three drill problems and worked solutions.

Runtime: ~30 s on Colab CPU. Pairs with Math you need.

Why end-to-end?