# Introduction Language models continue to shape how machine learning practitioners and developers build applications. The advent of capable, compact small language models add an intriguing layer to the mix. By bypassing third-party APIs, running models locally guarantees complete data privacy, eliminates per-token API costs, and enables offline operation. Among the tools powering this revolution, Ollama has emerged as one of the standards for running local inference due to its lightweight Go-based engine, simple CLI, and robust Docker-like model management system. However, simply pulling a model and running it with the default settings is rarely optimal. Default configurations are tuned for a broad, general-purpose audience, often prioritizing safe, conversational chat over performance, deterministic reasoning, or specialized system needs.…