QAT vs PTQ on our edge vision model: 6 months of A/B data

1 / 2

QAT vs PTQ on our edge vision model: 6 months of A/B data

DEV Community: pytorch·Marco Rinaldi·4 days ago

#CF9yVI0s

#dev #training #quantisation #model #int8 #torch

Reading 0:00

15s threshold

TL;DR: We ran post-training quantisation (PTQ) and quantisation-aware training (QAT) side by side on the same defect-classification model deployed on a Jetson Orin Nano. After six months in production, QAT recovered 3.1 mAP points over PTQ on rare defect classes, but cost us roughly two engineer-weeks of pipeline work and a 4x slower training cycle. So, the thing is, every time someone shows me a quantisation benchmark on ImageNet, I want to ask them what their actual deployment looks like. Because ImageNet validation accuracy at INT8 tells you almost nothing about whether your model will still detect the 0.4% of defect samples that pay for the whole project. We learned this the hard way at the end of last year, when the first quarter of production data came back from one of our partner sites and our PTQ model was missing scratches that the FP16 baseline caught fine. This post is the writeup.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

QAT vs PTQ on our edge vision model: 6 months of A/B data