You delivered a model that performs well in training but misses the field requirements: intermittent high latency, inference jitter that breaks real-time control loops, the model fits in flash but not SRAM, and battery life collapses after a few minutes. Unsupported ops fall back to the CPU and blow the budget. Those are the symptoms of a mismatch between algorithm decisions and hardware primitives — and they are exactly why you must embrace model-hardware mapping as an engineering discipline. Contents Why algorithm-hardware co-design wins on milliwatts and milliseconds Model-level levers that actually buy you latency and power Hardware primitives and practical model-hardware mapping patterns Cross-layer profiling and iterative optimization to find the real bottlenecks Deployment checklist: validation, safety and maintainability Why algorithm-hardware co-design wins on milliwatts and milliseconds The dominant cost in many ML workloads is data movement , not arithmetic.…