TL;DR: Our frame-based defect detector ran at 31ms on a Jetson Orin Nano, and the production line needed 12. Structured channel pruning at 45% plus a three-epoch fine-tune got us to 11.4ms for a 0.6 mAP drop. Unstructured pruning looked beautiful on paper and gave exactly zero real-world speedup, so we deleted it. So, the thing is, the model was never the problem. The detector hit 0.91 mAP in the lab and everyone was happy. Then we put it on the actual Orin Nano sitting next to the conveyor, and it ran at 31ms per frame. The line moves a stamped part every 12ms. You can imagine how that meeting went. Let me give you the full picture here. This was a side project away from my usual event-camera work at Prophesee, a plain RGB detector for surface defects on stamped metal panels. Small team, three of us, a hard latency budget and no room for a bigger GPU on the line. Cutting model size was the only lever we had. The trap I fell into first I reached for unstructured pruning because the literature loves it.…