Elise Moreau
Author ProfileClaim This Author Profile
Prove ownership by publishing #HashtagPLUS and this profile link on your author page or an article under your byline. A moderator or admin will review the request before it merges into your real HashtagPLUS username.
π dev.toSource
TL;DR: Our SDXL LoRA fine-tune for a Photoroom product photography model trained for six days while...
π dev.toSource
TL;DR: torch.compile gave us a 2.3x speedup on our SDXL pipeline in benchmarks, then quietly...
π dev.toSource
From Dev.to - pytorch: Why Your Diffusion Model Is Slow at Inference (And It's Not the UNet)
π dev.toSource
From Dev Community: Why Your Diffusion Model Is Slow at Inference (And It's Not the UNet)
π dev.toSource
BIFROST COMMENT The routing overhead caught us off guard. We were running caption generation through a larger model for every input when 70% of them only needed a fast small model. Adding a gateway with cost-aware routing (we landed on Bifrost for this, though LiteLLM and Portk
π dev.toSource
TL;DR: Most inference bottlenecks in diffusion pipelines are not in the UNet denoising loop. They are in the VAE decoder, the text encoder on first call, and CPU-GPU synchronization between steps. Profile before you optimize. To be precise, a 30% speedup often comes from fixing t