IOAI 2025 · Radar · Classify radar signal returns

Contest: IOAI 2025 (Beijing) · Round: Individual Contest · Category: ML / signal processing.

Official sources: Individual-Contest/Radar · Radar_Solution.ipynb.

1. Problem restatement

Each example is a small radar return cube — a multi-channel tensor representing a range–Doppler–azimuth slice of a radar scan, with one of several object types as the target label (vehicles, pedestrians, clutter, etc.). The contestant must build a classifier from labelled training cubes and submit predictions on a held-out test set. The task rewards feature engineering on signal-domain inputs as much as model design — a small CNN trained from scratch on raw cubes underperforms a lightweight model fed hand-crafted spectral features.

Source. Summary paraphrased from the official notebook header in Individual-Contest/Radar. Exact tensor shapes, class counts, and metric formula are [verify against the notebook] on contest day — they vary year-to-year and are not redistributable here.

2. What's being tested

Signal-domain reasoning. Range bins, Doppler bins, FFTs along the slow-time axis. The radar cube is not just an image — its axes have physical meaning.
Feature pipeline over modelling. Spectral statistics (peak-to-noise ratio, range of the strongest reflector, micro-Doppler signature) beat raw pixel CNNs at this scale.
Class imbalance. Clutter typically dominates; real-object classes are rare. Macro metrics force you to handle this.
Compute frugality. The Individual Contest gives a few hours per task — train a small model, don't reach for ResNet-50.

3. Data exploration / setup

import numpy as np, pandas as pd, pathlib
root = pathlib.Path("radar_2025")

labels = pd.read_csv(root / "train_labels.csv")
def load(i): return np.load(root / "train" / f"{i}.npy")   # shape (C, R, D) or similar

X0 = load(labels.id.iloc[0])
print(X0.shape, X0.dtype)
print(labels.label.value_counts())

EDA checklist:

Tensor axes. Confirm which axis is range vs Doppler vs antenna channel. The notebook header names them — read it.
Dynamic range. Radar magnitudes span 60+ dB. Log-transform before any model.
Per-class mean cube. Average all training cubes per class and look. Classes with distinct micro-Doppler bands are easy targets for hand-crafted features.
Imbalance. Build per-class weights right away (compute_class_weight).

4. Baseline approach

Log-magnitude, flatten, scale, gradient-boosted tree. ~30 lines, ships before you've finished reading the rest of the notebook.

import numpy as np, pandas as pd, pathlib
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import f1_score
from sklearn.ensemble import HistGradientBoostingClassifier

root = pathlib.Path("radar_2025")
labels = pd.read_csv(root / "train_labels.csv")
val    = pd.read_csv(root / "val_labels.csv")

def phi(i, split):
    X = np.load(root / split / f"{i}.npy").astype(np.float32)
    X = np.log1p(np.abs(X))             # dB-like compression
    return X.reshape(-1)

Xtr = np.stack([phi(i, "train") for i in labels.id])
Xva = np.stack([phi(i, "val")   for i in val.id])
sc = StandardScaler().fit(Xtr)
Xtr, Xva = sc.transform(Xtr), sc.transform(Xva)

clf = HistGradientBoostingClassifier(max_iter=300, learning_rate=0.05,
                                     class_weight="balanced")
clf.fit(Xtr, labels.label)
print("val macro-F1:", f1_score(val.label, clf.predict(Xva), average="macro"))
# illustrative: ~0.55 [illustrative]

5. Improvements that move the needle

5.1 · Spectral feature pack instead of flatten

For each cube compute: total energy, peak magnitude, peak range-bin index, peak Doppler-bin index, spectral centroid along each axis, spectral spread, kurtosis, ratio of peak to surrounding mean (a CFAR-style detector statistic), and a Doppler histogram (10 bins). That's ~30 features that encode "what does this signal look like" far more compactly than raw pixels.

def spectral(X):
    P = np.abs(X) ** 2
    e_total = P.sum()
    pk = P.max(); pi = np.unravel_index(P.argmax(), P.shape)
    r_marg = P.sum(axis=(1, 2)) if P.ndim == 3 else P.sum(axis=1)
    d_marg = P.sum(axis=(0, 2)) if P.ndim == 3 else P.sum(axis=0)
    return np.concatenate([
        [np.log1p(e_total), np.log1p(pk), pk / (P.mean() + 1e-9), *pi],
        np.log1p(r_marg / r_marg.sum()),
        np.log1p(d_marg / d_marg.sum()),
    ])

5.2 · Add a small CNN as a parallel feature extractor

Train a 3-layer CNN on the log-magnitude cube and concatenate its penultimate activation with your hand-crafted features. Feed the combined vector to the GBT. The CNN catches non-obvious local patterns the hand features miss; the GBT handles nonlinear combinations.

5.3 · Class-weighted focal loss for the CNN head

If you skip the GBT and train end-to-end, use focal loss with γ=2 and class weights. Standard cross-entropy collapses to predicting clutter.

5.4 · Time-axis augmentation

Random circular shifts along Doppler are physically reasonable (a radial-velocity offset). Apply them during training as a near-free regulariser. Avoid range-axis shifts — those break the physics.

5.5 · 5-fold ensemble, average probabilities

Tiny model + tiny data ⇒ high variance. 5-fold CV with probability averaging is a reliable +1–2 macro-F1.

6. Submission format & gotchas

Submit submission.csv with id,label matching test ids.
Fit scalers / PCAs only on the training split. Re-fitting on test silently leaks.
If the cube includes complex values, take magnitude — don't feed complex floats to scikit-learn.
Save the model and a quick reproducibility README; some IOAI tasks ask for the training notebook as well as predictions.

7. What top solutions did

The official solution notebook follows the spectral-features + small-CNN-features hybrid pattern. Community write-ups (when they appear) typically add 5-fold ensembling and a focal-loss CNN trained from scratch on log-magnitude cubes. No top solution used a giant pretrained backbone — the distribution shift from natural images to radar magnitudes is too large for transfer. [verify against official notebook]

8. Drill

D · Why does log-magnitude help so much before scaling?

Radar returns are squared magnitudes of complex amplitudes; they span many orders of magnitude. Linear-scale StandardScaler is dominated by a few high-energy bins and treats everything else as noise. Log compresses dynamic range so the scaler captures variation across the whole cube. This is the radar equivalent of "always log-transform pixel intensities in fluorescence microscopy".

D2 · Your val macro-F1 is high but test is much lower. Where did you leak?

Common leak sources: (a) you used a cube-id-based split that doesn't honour scene boundaries — two cubes from the same scan ended up across train/val, so the model memorised scene-level idiosyncrasies; (b) you fit a scaler / PCA on the combined train+val set; (c) you tuned the focal loss γ on val until val score peaked, which is val-fitting. Cure: rebuild the split by scene id not cube id, and use a separate calibration fold for hyperparameter tuning.

← IOAI 2025 Individual set