Been running Llama 3.3 70B via Groq for coding tasks and kept losing architectural decisions across sessions. "We use PostgreSQL" — forgotten. "Auth is JWT" — re-debated. Every new chat starts from zero. So I built steerhead — it sits between you and any OpenAI-compatible API and manages context via SQLite instead of chat history. The trick: every message is a single-shot API call. Steerhead assembles the system prompt from stored constraints + file history, fires one clean call, then auto-extracts any decisions the model made (via a second LLM pass) and stores them for next time. Result: 146 tokens of surgical context instead of 80K tokens of degrading conversation history. New session? The model still knows your entire project's decisions.…