The Agentic Gap: Claude Oneshots, Gemma Fails

1 / 8

The Agentic Gap: Claude Oneshots, Gemma Fails

DEV Community·Rob·25 days ago

#omuZqg1F

#ai #llm #search #gemma #code #opus

Reading 0:00

15s threshold

Two days ago, Gemma 4 topped our local model benchmark — 167 tokens per second, perfect code quality score, smallest download. Faster than Sonnet. Faster than Opus. The blog post said "Gemma 4 is the new default." Today we tested whether that's actually true. The Experiment Instead of another toy benchmark, we pulled a real item off the vibescoder.dev backlog: public-facing search across all blog posts . Multi-file feature, architectural decisions required, design system integration, no specification beyond "make search work." Two models. Same prompt. Same codebase. Same workspace template. One shot — no follow-up instructions, no hand-holding. Walk away and see what happens. Gemma 4 27B Opus 4.6 Provider Ollama (local, RTX 5090) Anthropic API (cloud) Benchmark speed 167.1 tok/s 74.3 tok/s Benchmark score 100/100 100/100 Cost $0 Per-token pricing The prompt was deliberately vague on implementation details: Add public-facing search to vibescoder.dev.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

The Agentic Gap: Claude Oneshots, Gemma Fails