62.8% on Aider Polyglot from a MacBook Pro. Then the other model we tried scored 4%. Here's what …

📰

62.8% on Aider Polyglot from a MacBook Pro. Then the other model we tried scored 4%. Here's what actually happened, with a working cost loop attached.

DEV Community·Christopher Maher·about 1 month ago

#infercostv030 #llmkubev072 #kubernetes #ai #aider #infercost

Reading 0:00

15s threshold

Originally published at llmkube.com/blog/m5-max-aider-polyglot-and-finops . Cross-posted here for the dev.to audience. A 24-hour Aider Polyglot run, a follow-up bench that blew up in interesting ways, and a working $/MTok number from a Kubernetes operator that scrapes Apple Silicon power live. Two open-source PRs landed today to make all of this reproducible on any M-series Mac. This is a coding-model benchmark on locally-served weights, plus a FinOps story. Every benchmark number traces to results files we can show you. Every cost number traces to a CSV captured by InferCost during the run. The point is the methodology and the tooling; the model rankings are along for the ride. TL;DR Qwen3.6-35B-A3B Q8 (Tongyi Lab, Apache 2.0) hit 62.8% on Aider Polyglot (pass_rate_2, n=223/225) running locally on a MacBook Pro M5 Max via LLMKube's Metal Agent.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

62.8% on Aider Polyglot from a MacBook Pro. Then the other model we tried scored 4%. Here's what actually happened, with a working cost loop attached.