Menu

Why Your Diffusion Model Is Slow at Inference (And It's Not the UNet)
📰
0

Why Your Diffusion Model Is Slow at Inference (And It's Not the UNet)

DEV Community·Elise Moreau·about 1 month ago
#Cm9i5gsl
Reading 0:00
15s threshold

TL;DR: Most inference bottlenecks in diffusion pipelines are not in the UNet denoising loop. They are in the VAE decoder, the text encoder on first call, and CPU-GPU synchronization between steps. Profile before you optimize. To be precise, a 30% speedup often comes from fixing the 5% of the code nobody looks at. I spent three weeks last month trying to make a Stable Diffusion XL variant run faster on A10G. The model was trained in-house for product photography. Inference was around 4.2 seconds per image at 1024x1024, 30 steps. Target was under 2 seconds. My first instinct was wrong. I went straight to the UNet. Compiled it with torch.compile , tried different attention implementations, looked at FlashAttention-3. I got it from 3.1s to 2.7s on the UNet alone. Nice. But total pipeline time barely moved. Then I actually profiled. What the profile showed import torch from torch.profiler import profile , ProfilerActivity with profile ( activities = [ ProfilerActivity . CPU , ProfilerActivity .…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More