In Q3 2024, our 12-person ML engineering team at a Series C fintech hit a wall: our TensorFlow 2.16 training pipelines for fraud detection transformers were taking 14.2 hours per epoch on 8x A100 nodes, burning $42k/month in cloud GPU costs. After a 6-week migration to PyTorch 2.5 with torch.compile and FSDP, we slashed epoch time to 9.23 hours – a 35% reduction – with identical F1 scores and 22% lower memory overhead. This isn't a hype post: we're sharing every benchmark, every migration gotcha, and production-ready code we used to make the switch. 📡 Hacker News Top Stories Right Now Localsend: An open-source cross-platform alternative to AirDrop (329 points) Microsoft VibeVoice: Open-Source Frontier Voice AI (140 points) Show HN: Live Sun and Moon Dashboard with NASA Footage (40 points) OpenAI CEO's Identity Verification Company Announced Fake Bruno Mars Partnership (134 points) Deep under Antarctic ice, a long-predicted cosmic whisper breaks through (24 points) Key Insights PyTorch 2.5's torch.compile…