Menu

Post image 1
Post image 2
1 / 2
0

HumanEval on a MacBook — 81.7% pass@1, Wi-Fi off

DEV Community·Matt Macosko·about 1 month ago
#CJkiU5Jz
Reading 0:00
15s threshold

The M5 Max MacBook Pro with 128 GB of unified memory is the first laptop that can hold a frontier-class coding agent entirely in RAM. No GPU rack. No cloud. No subscription. I just ran HumanEval on it. Wi-Fi off the entire run. 81.7% pass@1 on the full 164-problem benchmark Qwen 3 Coder 30B-A3B-Instruct (8-bit MLX) 14 minutes wall-clock, $0/month after the model download YouTube walkthrough (three real problems, code streaming live, tests going green): https://www.youtube.com/watch?v=muq7VdgxqRk Why this number matters The Qwen team didn't publish HumanEval scores for any Qwen3-Coder variant — they consider the benchmark saturated and went straight to agentic ones (SWE-bench Verified, BFCL, Aider-Polyglot). For the 30B variant — the one that actually fits on a laptop — there were no published HumanEval/MBPP numbers. Until this run. I also ran MBPP (sanitized): 83.3% pass@1 on a 168-problem sample.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More