\n In Q3 2024 benchmarks, PyTorch 2.5’s compiled mode delivered 3.2x higher inference throughput on AWS Inferentia 3 for BERT-Large workloads compared to eager mode, cutting p99 latency from 210ms to 65ms while reducing per-inference cost by 42%. \n\n 📡 Hacker News Top Stories Right Now Talkie: a 13B vintage language model from 1930 (410 points) The World's Most Complex Machine (82 points) Microsoft and OpenAI end their exclusive and revenue-sharing deal (900 points) Who owns the code Claude Code wrote? (34 points) Is my blue your blue? (2024) (591 points) \n\n Key Insights PyTorch 2.5 compiled mode reduces Inferentia 3 kernel launch overhead by 78% via ahead-of-time graph lowering to Neuron SDK 2.19. AWS Neuron SDK 2.19 adds first-class support for PyTorch 2.5's torch.compile() with custom backend registration for Inferentia 3's NeuronCore v3. Teams migrating from Inferentia 2 to Inferentia 3 with PyTorch 2.5 compiled mode see 62% lower per-inference costs than equivalent GPU-based deployments.…