Built an open-source memory layer for local LLMs — single-shot calls, auto-extracted constraints,…

1 / 2

Built an open-source memory layer for local LLMs — single-shot calls, auto-extracted constraints, no context degradation

DEV Community·Justin Joseph·about 1 month ago

#7mS4iCaU

#ai #llm #opensource #showdev #decisions #steerhead

Reading 0:00

15s threshold

Been running Llama 3.3 70B via Groq for coding tasks and kept losing architectural decisions across sessions. "We use PostgreSQL" — forgotten. "Auth is JWT" — re-debated. Every new chat starts from zero. So I built steerhead — it sits between you and any OpenAI-compatible API and manages context via SQLite instead of chat history. The trick: every message is a single-shot API call. Steerhead assembles the system prompt from stored constraints + file history, fires one clean call, then auto-extracts any decisions the model made (via a second LLM pass) and stores them for next time. Result: 146 tokens of surgical context instead of 80K tokens of degrading conversation history. New session? The model still knows your entire project's decisions.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Built an open-source memory layer for local LLMs — single-shot calls, auto-extracted constraints, no context degradation