torch.compile recompiled our SDXL UNet 38 times in production

1 / 2

torch.compile recompiled our SDXL UNet 38 times in production

DEV Community: pytorch·Elise Moreau·4 days ago

#SBQX6qPZ

#dev #compile #torch #recompiles #guard #article

Reading 0:00

15s threshold

TL;DR: torch.compile gave us a 2.3x speedup on our SDXL pipeline in benchmarks, then quietly recompiled 38 times across the first 100 production requests because every customer uploads a product photo at a different resolution. The fix wasn't turning compile off. It was understanding what counts as a guard, bucketing inputs to fixed shapes, and reading the recompilation logs PyTorch 2.3 gives you for free. The benchmark that lied to me At Photoroom we run diffusion models for product photography. Someone uploads a sneaker on a kitchen table, and the model gives it a clean studio background. The UNet is the heavy part, so when PyTorch 2.3 promised free speedups through torch.compile , I spent a week wiring it in. The benchmark looked great. Fixed 1024x1024 input, batch size 4, an A10G. 2.3x faster than eager mode after warmup. I shipped it to a 5% canary. p99 latency went up . Not by a little. Some requests took 70 seconds longer than before the change.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

torch.compile recompiled our SDXL UNet 38 times in production