Breaking the MoE Speculative Trap: 460 t/s on AMD Strix Halo Mixture-of-Experts (MoE) architectures like Qwen 3.6 35B-A3B have redefined the performance-per-watt ratio for consumer hardware. However, as LLM inference engines mature, we are discovering that traditional optimizations like Speculative Decoding (using a draft model) can sometimes become a "Performance Trap." In this technical deep-dive, we benchmark the AMD Strix Halo (Radeon 8060S) using the latest llama.cpp stack to identify the "Gold Configuration" for sovereign agents. The Theory: Speculative Decoding Speculative decoding uses a tiny "Junior" model to guess the next few tokens, which a large "Senior" model verifies in parallel. On paper, this skips the memory-bandwidth bottleneck of the large model for several tokens at a time.…