A 96GB GPU couldn't run 1024x768 I2V (83.5 GiB peak). The 54 GiB wasn't the model — it was an autograd graph from consuming a lazy VAE decode iterator outside no_grad. One-line fix, peak drops to 29.5 GiB, flat up to 2048x1536.
An acoustic modem and mesh prototype with measured encode/decode timings, round-trip proof, and an honest boundary between verified core modem work and untested multi-node mesh.
In vision AI systems, model throughput continues to improve. The surrounding pipeline stages must keep pace, including decode, preprocessing, and GPU scheduling.
As large language model (LLM) inference workloads grow in complexity, a single monolithic serving process starts to hit its limits. Prefill and decode stages…