TL;DR: We took Meta's SAM 2 small (around 224M params) and distilled it into a 6.3MB student that runs at 31 FPS on a Jetson Orin Nano for an automotive surface-defect pipeline. Mask IoU drops from 0.91 to 0.84, which is acceptable for the defect shapes we care about. The single biggest lever was a feature-alignment loss on the image embedding, not the mask logits. So, the thing is, most of my year goes into event-camera work at Prophesee, but a side contract this spring with an automotive supplier outside Brescia ate two months of my evenings. They make aluminium body panels and they wanted real-time masks for surface defects: scratches, dents, paint pinholes. Cameras are boring CMOS at 25 FPS and 4MP. Target hardware is a Jetson Orin Nano because the PLCs on the line already talk to one over Ethernet. First thing we tried was to fine-tune SAM 2 small directly and ship it with TensorRT FP16. About 1.2 seconds per image on the Orin. That's roughly 30x too slow for a moving line.…