✦ The successful benchmark run on TPU v6e-4 used the following "Balanced Production" flags. These were specifically tuned to stabilize the 26B MoE model on the 4-chip topology while maintaining peak performance. 🚀 vLLM Startup Command (Verified) 1 vllm serve google/gemma-4-26B-A4B-it \ 2 --tensor-parallel-size 4 \ 3 --dtype bfloat16 \ 4 --kv-cache-dtype fp8 \ 5 --max-model-len 16384 \ 6 --speculative-config '{"method": "ngram", "num_speculative_tokens": 3}' \ 7 --max-num-batched-tokens 4096 \ 8 --max-num-seqs 256 \ 9 --enable-prefix-caching \ Enter fullscreen mode Exit fullscreen mode 10 --disable_chunked_mm_input \ 11 --limit-mm-per-prompt '{"image":4,"audio":1}' \ 12 --enable-auto-tool-choice \ 13 --tool-call-parser gemma4 \ 14 --reasoning-parser gemma4 \ 15 --trust-remote-code ⚙️ Critical Parameters Explained ┌─────────────────────────┬───────┬────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ Flag │ Value │ Rationale │…