At the end of Round 1, we promised a rematch. More models. Fixed settings. Harder questions about what "local inference" really means when you push past what fits in VRAM. This is that rematch. We added two models that the Coder dev team specifically requested: Gemma 4 from Google (27B parameters, fits comfortably on the RTX 5090) and Kimi K2 from Moonshot AI (1 trillion parameters, does not fit in anything reasonable). We also reran every model from Round 1 with fixes for the configuration issues that tripped up three of them. The results changed the leaderboard significantly. What We Fixed from Round 1 Round 1 had three avoidable failures: Qwen hit the token limit — scored 28/100 because the output was capped at 4,096 tokens and the code got truncated mid-f-string. The model was generating at 1,510 tok/s. It wasn't slow. We just cut it off. Codestral and DeepSeek built interactive menus — both interpreted "commands: add, list, complete, delete" as while True: input() loops instead of CLI argument parsers.…