Elise Moreau

Author Profile

Claim This Author Profile

Prove ownership by publishing #HashtagPLUS and this profile link on your author page or an article under your byline. A moderator or admin will review the request before it merges into your real HashtagPLUS username.

0 karma0 postsjoined about 1 month ago

The bf16 grad accumulator that killed our SDXL LoRA training

🌐 dev.toSource

TL;DR: Our SDXL LoRA fine-tune for a Photoroom product photography model trained for six days while...

#dev #training #finite #gradient #grad #article #englishlanguage

4 days ago

torch.compile recompiled our SDXL UNet 38 times in production

🌐 dev.toSource

TL;DR: torch.compile gave us a 2.3x speedup on our SDXL pipeline in benchmarks, then quietly...

#dev #compile #torch #recompiles #guard #article #englishlanguage

4 days ago

Why Your Diffusion Model Is Slow at Inference (And It's Not the UNet)

🌐 dev.toSource

From Dev.to - pytorch: Why Your Diffusion Model Is Slow at Inference (And It's Not the UNet)

#machinelearning #pytorch #computervision #mlops #torch #first #unet #cuda

about 1 month ago

Why Your Diffusion Model Is Slow at Inference (And It's Not the UNet)

🌐 dev.toSource

From Dev Community: Why Your Diffusion Model Is Slow at Inference (And It's Not the UNet)

#machinelearning #pytorch #computervision #mlops #torch #first #unet #cuda

about 1 month ago

Kimi K2.6 Is a Legit Opus 4.7 Replacement

🌐 dev.toSource

From Dev Community: Kimi K2.6 Is a Legit Opus 4.7 Replacement

#ai #programming #productivity #automation #opus #kimi #tasks #model

about 1 month ago

Diffusion Model Inference in Production: What the Benchmarks Leave Out

🌐 dev.toSource

BIFROST COMMENT The routing overhead caught us off guard. We were running caption generation through a larger model for every input when 70% of them only needed a fast small model. Adding a gateway with cost-aware routing (we landed on Bifrost for this, though LiteLLM and Portk

#dev #bifrost #model #routing #github #photo #article #englishlanguage

about 1 month ago

Why Your Diffusion Model Is Slow at Inference (And It's Not the UNet)

🌐 dev.toSource

TL;DR: Most inference bottlenecks in diffusion pipelines are not in the UNet denoising loop. They are in the VAE decoder, the text encoder on first call, and CPU-GPU synchronization between steps. Profile before you optimize. To be precise, a 30% speedup often comes from fixing t

#dev #class #code #torch #article #englishlanguage

about 1 month ago

Menu

Elise Moreau