Putting the GPU to Work: Running Local LLMs on a Home Lab

1 / 5

Putting the GPU to Work: Running Local LLMs on a Home Lab

DEV Community·Rob·25 days ago

#XOxNeGwN

#phase #why #ollama #model #models #coder

Reading 0:00

15s threshold

Yesterday we went from a gaming PC on a shelf to a fully configured Coder server with GitHub integration, workspace templates, and AI agents. The dev environment is running. But the RTX 5090's 32 GB of VRAM has been sitting idle, and all the AI work is still going through cloud APIs. Today, we change that. This session was about installing Ollama, choosing the right models for different coding tasks, getting local inference running on the workstation, and then wiring it all into Coder Agents so local models show up right alongside Anthropic in the model selector. Everything here was done conversationally through Coder Agents , same as always. Why VRAM Is the Only Spec That Matters Before pulling any models, it helps to understand the constraint you're optimizing around. For local LLMs, that constraint is VRAM. Not CPU cores, not system RAM, not disk speed. VRAM determines what models you can run, and model size determines how useful they are.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Putting the GPU to Work: Running Local LLMs on a Home Lab