TL;DR: We ran post-training quantisation (PTQ) and quantisation-aware training (QAT) side by side on the same defect-classification model deployed on a Jetson Orin Nano. After six months in production, QAT recovered 3.1 mAP points over PTQ on rare defect classes, but cost us roughly two engineer-weeks of pipeline work and a 4x slower training cycle. So, the thing is, every time someone shows me a quantisation benchmark on ImageNet, I want to ask them what their actual deployment looks like. Because ImageNet validation accuracy at INT8 tells you almost nothing about whether your model will still detect the 0.4% of defect samples that pay for the whole project. We learned this the hard way at the end of last year, when the first quarter of production data came back from one of our partner sites and our PTQ model was missing scratches that the FP16 baseline caught fine. This post is the writeup.…