← IOAI 2024 Practical set

IOAI 2024 Practical · Creative brief & storyboard

Contest: IOAI 2024 (Bulgaria) · Round: Practical, on-site (4 h) · Category: Creative AI / planning & brief.

Official source: ioai-official.org/2024-tasks. The on-site brief was based on a track by Maria Ilieva; exact song and lyrics distributed to teams during the round. [verify against on-site materials]

1. Problem restatement

Teams receive an audio file (a song) and are tasked with producing both an album cover and a short music video using existing generative AI tools, in 4 hours. The single biggest determinant of jury score is not generation quality — it's the consistency between the cover and the video, and the alignment of both with the song's mood. This task page is about the planning phase: turning a song you've never heard into a one-page brief tight enough that downstream prompt engineering becomes mechanical.

Source. Paraphrased from the IOAI 2024 Practical Round description on ioai-official.org. The actual rubric used by the jury is not fully published — the weighting below is [illustrative].

2. What's being tested

3. Data exploration / setup

You have an audio.mp3 file and (typically) a lyric sheet. First-pass tooling:

import librosa, numpy as np
y, sr = librosa.load("audio.mp3", sr=22050)
tempo, _ = librosa.beat.beat_track(y=y, sr=sr)
chroma = librosa.feature.chroma_stft(y=y, sr=sr).mean(1)  # 12-dim key profile
print("tempo:", tempo, " key:", "ABCDEFGABCDE"[chroma.argmax()])
duration = librosa.get_duration(y=y, sr=sr)

Use these numbers to set the video's cut pace (one cut per ~2 beats is a safe pop video rule) and aspect ratio (the assigned track was a Maria Ilieva pop number — a 16:9 or 9:16 vertical aspect both work; commit to one early).

4. Baseline approach

Spend exactly 30 minutes on the brief. Write the following sections, in this order, in a shared doc:

  1. Song summary (3 sentences): genre, mood, lyrical theme.
  2. Visual concept (1 sentence): "neon-soaked nighttime city seen through rain on a car window".
  3. Three colours (hex): one primary, one accent, one neutral. These constrain every image you generate.
  4. Reference: name a known director or visual artist whose style you'll echo. This word goes in every prompt.
  5. Six storyboard frames for the video: a tiny doodle is enough. One frame per ~5 seconds for a 30-second video.
  6. Cover composition: subject, framing, where the title type will sit.

Sign off — out loud, to your team — before any generation tool is opened. Drift from the brief is the single biggest source of lost points.

5. Improvements that move the needle

5.1 · Build a "prompt fragment library" before the round

Pre-write fragments for camera ("shot on Arri Alexa, 35 mm, shallow depth"), light ("soft volumetric rim light"), grade ("teal & orange grade"), and style ("Wong Kar-wai influence"). On the day, plug fragments into the brief's three colour codes.

5.2 · Decide cover vs video order based on your strongest tool

If your team is faster with image generation, build the cover first and use its hero frame as the image-to-video seed. If you're faster with video, generate a 5-second loop and screenshot the most striking frame for the cover. Reversing this costs an hour.

5.3 · Lock seed and aspect ratio across deliverables

The same SDXL seed with the same prompt produces a consistent character across multiple images. Generate your "hero" character at one seed, then derive cover + video keyframes from that anchor.

5.4 · Build a 60-second highlight cut first, then expand

A polished 15-second cut beats a rough 45-second cut. Hit save at 90 minutes of video work; if you have time, add more frames. If not, your 15-second cut is the submission.

5.5 · Plan the title typography manually

Diffusion models still mangle text in 2024. Generate the image without title text, then composite the title in any vector tool (Figma, Inkscape). Treat the typeface as a brief decision: a single serif weight per concept.

6. Submission format & gotchas

7. What top solutions did

Public write-ups for IOAI 2024 Practical are sparse — the round is judged subjectively and teams don't publish briefs. The pattern visible in the screened final-presentation videos: teams that committed to a single visual era (e.g. "1980s neon" or "watercolour storybook") for the full 4 hours scored higher than teams that explored multiple directions. The brief is the lock on that commitment. [illustrative]

8. Drill

D · The song is melancholic, in A minor, 72 bpm, mentions rain and goodbye. Sketch a brief in 60 seconds.

Visual concept: a single empty train platform at dusk, low-key blue + amber neon, slow camera drift. Colours: #1B2540 (deep blue), #D88A3A (amber), #E6E1D4 (warm grey). Reference: Wong Kar-wai. Six frames: (1) close-up rain on glass, (2) wide shot platform with one figure walking away, (3) clock close-up, (4) bench detail, (5) train arriving in motion blur, (6) figure boarding. Cover: figure silhouette centered, title at bottom in serif type. The video cut pace at 72 bpm is one cut every ~5 s — six frames perfectly fit a 30-second video.

D2 · Your team disagrees on the colour palette. How do you resolve in < 5 minutes?

Force a vote on a single reference image: each team member proposes one URL of a real-world photo whose palette fits the song. Pick the image with two votes. Run a colour-picker (any palette tool) and lock the top 3 colours. The vote takes 90 seconds; the picker takes 30 seconds. The remaining 3 minutes are documenting the choice so no one re-litigates at hour 3.

← IOAI 2024 Practical set