Menu

Post image 1
Post image 2
Post image 3
1 / 3
0

Building an Open-Source Text-to-30s-Cinematic-Reel Pipeline on a Single AMD MI300X

DEV Community·BladeDev·22 days ago
#9On6IRXu
#ai#programming#python#wan2#director#flux
Reading 0:00
15s threshold

Built this for the AMD x lablab hackathon. One English sentence becomes a 30-second cinematic reel with characters, story, music, and per-shot voice-over. ~45 minutes end-to-end on a single AMD Instinct MI300X. Every model is Apache 2.0 or MIT. Code: github.com/bladedevoff/studiomi300 Architecture The Director also doubles as the Vision Critic — same Qwen3.5-35B checkpoint reloaded with a different system prompt, two roles. Saves 70 GB of VRAM. Why a single MI300X 192 GB HBM3 lets four very different architectures share one card sequentially: 35B MoE director, 4B diffusion, 14B I2V MoE, 3.5B music, 82M TTS. On 24 GB consumer hardware this stack needs 4-5 separate machines wired together. Models unload between phases via gc.collect() + torch.cuda.empty_cache() . The Director runs in a subprocess so its full memory frees on exit before Wan2.2 loads, otherwise OOM. What the Vision Critic does Most generative video pipelines render once and pray.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More