TL;DR I run LTX-2.3 image-to-video (I2V) locally on a 96 GB GPU. At 1024×768 / 97 frames it peaked at 83.5 GiB — so close to the ceiling that it OOM'd whenever my image-generation server was co-resident, and 1280×768 OOM'd outright. I assumed I'd hit a hardware wall. I hadn't. 54 of those gigabytes were an autograd graph. The pipeline returns a lazy decode iterator; the real VAE decode runs when you encode the output — and in my harness that happened outside the with torch.no_grad(): block, so every conv activation in the decoder was retained for a backward pass that never comes. Moving one call inside the no_grad block: before after I2V 1024×768/97f peak 83.5 GiB 29.5 GiB (−65%) time 151.6 s 135.2 s (slightly faster) And the peak goes nearly flat across resolution — 2048×1536 (3.1 MP) tops out at 33.6 GiB . The "I need a bigger GPU" conclusion was a measurement artifact. The lever I tried first — finer VAE decode tiling — barely moved the number. That dead end is part of the story.…