Menu

Post image 1
Post image 2
1 / 2
0

Claude Code with Local LLMs and ANTHROPIC_BASE_URL: Ollama, LM Studio, llama.cpp, vLLM

DEV Community·René Zander·about 1 month ago
#zSQ9oKSe
#ai#llm#programming#tutorial#claude#code
Reading 0:00
15s threshold

Native Anthropic endpoints, tool-call compatibility, and context-window sizing for local Claude Code. Last tested: April 2026. See Changelog at the bottom. TL;DR cheat sheet Goal Use MacBook Air Gemma 4 26B-A4B Q4, 32K context , LM Studio or Ollama MacBook Pro Gemma 4 26B-A4B Q4 / UD-Q4, 64K context , llama.cpp or LM Studio Claude Code minimum 32K context (anything below is a chat demo) Best local backend LM Studio or Ollama first; llama.cpp for advanced; vLLM for servers Avoid 8K / 16K context, dense 31B Gemma 4 on 32 GB machines, old llama.cpp builds The local-Claude-Code rule of thumb Three things decide whether a local Claude Code session works: Model quality decides whether the answer is smart. Tool-call formatting decides whether Claude Code can act on the answer. Context length decides whether the session survives past the first few edits. For local coding agents: 32K is the floor. 64K is the sweet spot. Anything below 32K is a chat demo, not Claude Code. Recommended setup Use this first.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More