I used to have a ridiculous local AI setup. Ollama running as a service. A separate Python venv for LangChain experiments. Another terminal with llama.cpp because I wanted to test quantized models. Three different API formats, three different port numbers, three things that broke independently every time I updated macOS. Then Docker shipped Model Runner and I deleted all of it. What Model Runner Actually Is It's built into Docker Desktop. No separate install. You pull models the same way you pull images: docker model pull ai/llama3.1 docker model pull ai/phi3-mini docker model pull ai/mistral Enter fullscreen mode Exit fullscreen mode Run inference: docker model run ai/llama3.1 "Explain NUMA topology in two sentences" Enter fullscreen mode Exit fullscreen mode Or hit the API endpoint, which is OpenAI-compatible: curl http://localhost:12434/engines/llama3.1/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "messages": [{"role": "user", "content": "What is OKE?"}], "max_tokens": 100 }' Enter…