Mock B · Small vision — “Hand-drawn shape classification”
Format: Round-2 style · Time: 180 minutes · Compute: Colab L4 GPU allowed · Submission: predictions.csv + 3 short-answer responses · Metric: macro accuracy.
A self-contained small-scale vision mock modeled on a USAAIO Round-2 problem. You generate a 5 000-image synthetic dataset of hand-drawn shapes, train a CNN on a Colab L4 GPU, predict the class of a held-out test set, and answer three in-line theory questions. Three hours, end to end.
- Time budget: 180 minutes, hard stop, including data generation, training, and write-up.
- Compute: one Colab L4 GPU (or local equivalent). No pretrained ImageNet weights, no foundation models.
- Libraries: PyTorch, torchvision, numpy, pandas, matplotlib.
torchvision.transformsis allowed;timmis not. - Submission:
predictions.csvwith headerid,label, plus a markdown cell answering the 3 theory questions at the end of the notebook.
Problem statement
You receive 5 000 grayscale 64×64 images of hand-drawn shapes in 5 categories: circle,
square, triangle, star, arrow. Each shape is rendered
by a stochastic procedure that varies size, stroke thickness, in-plane rotation, translation, and
additive Gaussian noise. Class balance is exactly uniform (1 000 / class).
You will split the 5 000 images into 4 000 train and 1 000 held-out test by index (the generator below yields a deterministic split). Train a CNN from scratch, and submit predicted class labels for the test images. The grader computes macro accuracy (mean per-class accuracy) on the held-out labels.
The dataset is fully synthetic and generated by the snippet in the next section; cite it as such if you publish your solution.
Synthetic data generator (run once at the top of the notebook)
import numpy as np
from PIL import Image, ImageDraw
from pathlib import Path
CLASSES = ["circle", "square", "triangle", "star", "arrow"]
ROOT = Path("shapes")
ROOT.mkdir(exist_ok=True)
rng = np.random.default_rng(7)
def draw_shape(cls, size=64):
img = Image.new("L", (size, size), color=0)
d = ImageDraw.Draw(img)
cx, cy = size // 2 + rng.integers(-6, 7), size // 2 + rng.integers(-6, 7)
r = rng.integers(14, 24)
stroke = int(rng.integers(1, 4))
if cls == "circle":
d.ellipse([cx-r, cy-r, cx+r, cy+r], outline=255, width=stroke)
elif cls == "square":
d.rectangle([cx-r, cy-r, cx+r, cy+r], outline=255, width=stroke)
elif cls == "triangle":
d.polygon([(cx, cy-r), (cx-r, cy+r), (cx+r, cy+r)], outline=255)
elif cls == "star":
pts = []
for k in range(10):
ang = -np.pi/2 + k * np.pi / 5
rad = r if k % 2 == 0 else r * 0.45
pts.append((cx + rad*np.cos(ang), cy + rad*np.sin(ang)))
d.polygon(pts, outline=255)
elif cls == "arrow":
d.line([(cx-r, cy), (cx+r, cy)], fill=255, width=stroke)
d.polygon([(cx+r, cy-6), (cx+r, cy+6), (cx+r+8, cy)], outline=255)
img = img.rotate(rng.uniform(-25, 25), resample=Image.BILINEAR)
arr = np.array(img).astype(np.float32)
arr += rng.normal(0, 8, size=arr.shape)
return np.clip(arr, 0, 255).astype(np.uint8)
records = []
for i in range(5000):
cls = CLASSES[i % 5]
arr = draw_shape(cls)
Image.fromarray(arr).save(ROOT / f"{i:05d}.png")
records.append((i, cls))
# deterministic 4000/1000 split by index
import pandas as pd
df = pd.DataFrame(records, columns=["id", "label"])
train = df.iloc[:4000]
test = df.iloc[4000:]
train.to_csv("train.csv", index=False)
test[["id"]].to_csv("test.csv", index=False) # no labels at test time
test[["id", "label"]].to_csv("test_labels.csv", index=False) # for your own scoring
Submission format
A single CSV named predictions.csv with 1 000 data rows and the header below.
id,label
4000,star
4001,circle
4002,arrow
...
4999,square
idis the image index; the grader joins onid.- Labels must be exactly one of
circle,square,triangle,star,arrow(lowercase). - Plus a markdown cell at the end of the notebook with answers to the 3 theory questions in Theory short-answer.
Scoring rubric (100 points)
| Section | Points | What earns credit |
|---|---|---|
| Data loader | 15 | Custom Dataset + DataLoader, correct image-tensor normalization, train/val split (held-out within the 4 000 train), no leakage from the test 1 000. |
| Baseline CNN | 30 | A from-scratch CNN (3–5 conv blocks, ≤500 k params) trained for a documented number of epochs, with reported train/val accuracy and a loss curve. |
| Augmentation | 20 | At least 2 augmentations (e.g. RandomAffine, RandomHorizontalFlip — only if class-symmetric, else justify omission, ColorJitter), with a measured val-accuracy delta vs. no augmentation. |
| Training stability | 15 | Pinned seeds, fixed batch size, a sensible LR schedule (step / cosine), early stopping or explicit best-epoch selection on val. |
| Evaluation report | 10 | Per-class accuracy + confusion matrix on the val set; brief comment on the hardest class. |
| Theory short-answer | 10 | Three questions in Theory short-answer, ~3 sentences each. |
Theory short-answer (10 pts, 3 questions)
- Receptive fields. Your final conv stack is
Conv(3×3) → MaxPool(2) → Conv(3×3) → MaxPool(2) → Conv(3×3). What is the receptive field of one output neuron with respect to the 64×64 input, in pixels? Show the arithmetic. - BatchNorm at inference. Why does a BatchNorm layer behave differently during training vs. inference, and what concretely happens if you forget to switch the model into inference mode before predicting on the test set?
- Augmentation impact. Your rotation augmentation lifts val accuracy by ~3 percentage points but slows training by ~30%. Give one principled reason rotation specifically helps on this dataset, and one realistic situation where you would remove rotation augmentation.