Fine-tuning Llama 3.2 70B on NVIDIA A10G clusters used to take 18 hours and cost $420 per run. With PyTorch 2.4, vLLM 0.4, and AWS Trainium 2 instances, we’ve cut that to 4.1 hours and $159 per run — a 62% cost reduction and 4.4x throughput gain, with zero model accuracy loss. 📡 Hacker News Top Stories Right Now A Couple Million Lines of Haskell: Production Engineering at Mercury (208 points) This Month in Ladybird - April 2026 (324 points) Forging ZK proofs to mint arbitrary DUSK tokens (19 points) Dav2d (477 points) Six Years Perfecting Maps on WatchOS (287 points) Key Insights PyTorch 2.4’s new torch.neuronx\ integration reduces Trainium 2 kernel launch overhead by 37% vs PyTorch 2.3 vLLM 0.4 adds experimental Trainium 2 support via the vllm-aws\ extension, enabling 2.8x higher inference throughput during fine-tuning validation AWS trn2.48xlarge instances deliver 4.1x higher tokens/sec per dollar than NVIDIA A10G GPU instances for Llama 3.2 70B fine-tuning By Q3 2025, 70% of production Llama fine-tuning…