IOAI 2025 · Radar · Classify radar signal returns
Contest: IOAI 2025 (Beijing) · Round: Individual Contest · Category: ML / signal processing.
Official sources: Individual-Contest/Radar · Radar_Solution.ipynb.
1. Problem restatement
Each example is a small radar return cube — a multi-channel tensor representing a range–Doppler–azimuth slice of a radar scan, with one of several object types as the target label (vehicles, pedestrians, clutter, etc.). The contestant must build a classifier from labelled training cubes and submit predictions on a held-out test set. The task rewards feature engineering on signal-domain inputs as much as model design — a small CNN trained from scratch on raw cubes underperforms a lightweight model fed hand-crafted spectral features.
2. What's being tested
- Signal-domain reasoning. Range bins, Doppler bins, FFTs along the slow-time axis. The radar cube is not just an image — its axes have physical meaning.
- Feature pipeline over modelling. Spectral statistics (peak-to-noise ratio, range of the strongest reflector, micro-Doppler signature) beat raw pixel CNNs at this scale.
- Class imbalance. Clutter typically dominates; real-object classes are rare. Macro metrics force you to handle this.
- Compute frugality. The Individual Contest gives a few hours per task — train a small model, don't reach for ResNet-50.
3. Data exploration / setup
import numpy as np, pandas as pd, pathlib
root = pathlib.Path("radar_2025")
labels = pd.read_csv(root / "train_labels.csv")
def load(i): return np.load(root / "train" / f"{i}.npy") # shape (C, R, D) or similar
X0 = load(labels.id.iloc[0])
print(X0.shape, X0.dtype)
print(labels.label.value_counts())
EDA checklist:
- Tensor axes. Confirm which axis is range vs Doppler vs antenna channel. The notebook header names them — read it.
- Dynamic range. Radar magnitudes span 60+ dB. Log-transform before any model.
- Per-class mean cube. Average all training cubes per class and look. Classes with distinct micro-Doppler bands are easy targets for hand-crafted features.
- Imbalance. Build per-class weights right away (
compute_class_weight).
4. Baseline approach
Log-magnitude, flatten, scale, gradient-boosted tree. ~30 lines, ships before you've finished reading the rest of the notebook.
import numpy as np, pandas as pd, pathlib
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import f1_score
from sklearn.ensemble import HistGradientBoostingClassifier
root = pathlib.Path("radar_2025")
labels = pd.read_csv(root / "train_labels.csv")
val = pd.read_csv(root / "val_labels.csv")
def phi(i, split):
X = np.load(root / split / f"{i}.npy").astype(np.float32)
X = np.log1p(np.abs(X)) # dB-like compression
return X.reshape(-1)
Xtr = np.stack([phi(i, "train") for i in labels.id])
Xva = np.stack([phi(i, "val") for i in val.id])
sc = StandardScaler().fit(Xtr)
Xtr, Xva = sc.transform(Xtr), sc.transform(Xva)
clf = HistGradientBoostingClassifier(max_iter=300, learning_rate=0.05,
class_weight="balanced")
clf.fit(Xtr, labels.label)
print("val macro-F1:", f1_score(val.label, clf.predict(Xva), average="macro"))
# illustrative: ~0.55 [illustrative]
5. Improvements that move the needle
5.1 · Spectral feature pack instead of flatten
For each cube compute: total energy, peak magnitude, peak range-bin index, peak Doppler-bin index, spectral centroid along each axis, spectral spread, kurtosis, ratio of peak to surrounding mean (a CFAR-style detector statistic), and a Doppler histogram (10 bins). That's ~30 features that encode "what does this signal look like" far more compactly than raw pixels.
def spectral(X):
P = np.abs(X) ** 2
e_total = P.sum()
pk = P.max(); pi = np.unravel_index(P.argmax(), P.shape)
r_marg = P.sum(axis=(1, 2)) if P.ndim == 3 else P.sum(axis=1)
d_marg = P.sum(axis=(0, 2)) if P.ndim == 3 else P.sum(axis=0)
return np.concatenate([
[np.log1p(e_total), np.log1p(pk), pk / (P.mean() + 1e-9), *pi],
np.log1p(r_marg / r_marg.sum()),
np.log1p(d_marg / d_marg.sum()),
])
5.2 · Add a small CNN as a parallel feature extractor
Train a 3-layer CNN on the log-magnitude cube and concatenate its penultimate activation with your hand-crafted features. Feed the combined vector to the GBT. The CNN catches non-obvious local patterns the hand features miss; the GBT handles nonlinear combinations.
5.3 · Class-weighted focal loss for the CNN head
If you skip the GBT and train end-to-end, use focal loss with γ=2 and class weights. Standard cross-entropy collapses to predicting clutter.
5.4 · Time-axis augmentation
Random circular shifts along Doppler are physically reasonable (a radial-velocity offset). Apply them during training as a near-free regulariser. Avoid range-axis shifts — those break the physics.
5.5 · 5-fold ensemble, average probabilities
Tiny model + tiny data ⇒ high variance. 5-fold CV with probability averaging is a reliable +1–2 macro-F1.
6. Submission format & gotchas
- Submit
submission.csvwithid,labelmatching test ids. - Fit scalers / PCAs only on the training split. Re-fitting on test silently leaks.
- If the cube includes complex values, take magnitude — don't feed complex floats to scikit-learn.
- Save the model and a quick reproducibility README; some IOAI tasks ask for the training notebook as well as predictions.
7. What top solutions did
The official solution notebook follows the spectral-features + small-CNN-features hybrid pattern. Community write-ups (when they appear) typically add 5-fold ensembling and a focal-loss CNN trained from scratch on log-magnitude cubes. No top solution used a giant pretrained backbone — the distribution shift from natural images to radar magnitudes is too large for transfer. [verify against official notebook]
8. Drill
D · Why does log-magnitude help so much before scaling?
Radar returns are squared magnitudes of complex amplitudes; they span many orders of magnitude. Linear-scale StandardScaler is dominated by a few high-energy bins and treats everything else as noise. Log compresses dynamic range so the scaler captures variation across the whole cube. This is the radar equivalent of "always log-transform pixel intensities in fluorescence microscopy".
D2 · Your val macro-F1 is high but test is much lower. Where did you leak?
Common leak sources: (a) you used a cube-id-based split that doesn't honour scene boundaries — two cubes from the same scan ended up across train/val, so the model memorised scene-level idiosyncrasies; (b) you fit a scaler / PCA on the combined train+val set; (c) you tuned the focal loss γ on val until val score peaked, which is val-fitting. Cure: rebuild the split by scene id not cube id, and use a separate calibration fold for hyperparameter tuning.