Native Anthropic endpoints, tool-call compatibility, and context-window sizing for local Claude Code. Last tested: April 2026. See Changelog at the bottom. TL;DR cheat sheet Goal Use MacBook Air Gemma 4 26B-A4B Q4, 32K context , LM Studio or Ollama MacBook Pro Gemma 4 26B-A4B Q4 / UD-Q4, 64K context , llama.cpp or LM Studio Claude Code minimum 32K context (anything below is a chat demo) Best local backend LM Studio or Ollama first; llama.cpp for advanced; vLLM for servers Avoid 8K / 16K context, dense 31B Gemma 4 on 32 GB machines, old llama.cpp builds The local-Claude-Code rule of thumb Three things decide whether a local Claude Code session works: Model quality decides whether the answer is smart. Tool-call formatting decides whether Claude Code can act on the answer. Context length decides whether the session survives past the first few edits. For local coding agents: 32K is the floor. 64K is the sweet spot. Anything below 32K is a chat demo, not Claude Code. Recommended setup Use this first.…