Claude Code with Local LLMs and ANTHROPIC_BASE_URL: Ollama, LM Studio, llama.cpp, vLLM

1 / 2

Claude Code with Local LLMs and ANTHROPIC_BASE_URL: Ollama, LM Studio, llama.cpp, vLLM

DEV Community·René Zander·about 1 month ago

#zSQ9oKSe

#ai #llm #programming #tutorial #claude #code

Reading 0:00

15s threshold

Native Anthropic endpoints, tool-call compatibility, and context-window sizing for local Claude Code. Last tested: April 2026. See Changelog at the bottom. TL;DR cheat sheet Goal Use MacBook Air Gemma 4 26B-A4B Q4, 32K context , LM Studio or Ollama MacBook Pro Gemma 4 26B-A4B Q4 / UD-Q4, 64K context , llama.cpp or LM Studio Claude Code minimum 32K context (anything below is a chat demo) Best local backend LM Studio or Ollama first; llama.cpp for advanced; vLLM for servers Avoid 8K / 16K context, dense 31B Gemma 4 on 32 GB machines, old llama.cpp builds The local-Claude-Code rule of thumb Three things decide whether a local Claude Code session works: Model quality decides whether the answer is smart. Tool-call formatting decides whether Claude Code can act on the answer. Context length decides whether the session survives past the first few edits. For local coding agents: 32K is the floor. 64K is the sweet spot. Anything below 32K is a chat demo, not Claude Code. Recommended setup Use this first.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Claude Code with Local LLMs and ANTHROPIC_BASE_URL: Ollama, LM Studio, llama.cpp, vLLM