Menu

Post image 1
Post image 2
1 / 2
0

Built an open-source memory layer for local LLMs — single-shot calls, auto-extracted constraints, no context degradation

DEV Community·Justin Joseph·about 1 month ago
#7mS4iCaU
Reading 0:00
15s threshold

Been running Llama 3.3 70B via Groq for coding tasks and kept losing architectural decisions across sessions. "We use PostgreSQL" — forgotten. "Auth is JWT" — re-debated. Every new chat starts from zero. So I built steerhead — it sits between you and any OpenAI-compatible API and manages context via SQLite instead of chat history. The trick: every message is a single-shot API call. Steerhead assembles the system prompt from stored constraints + file history, fires one clean call, then auto-extracts any decisions the model made (via a second LLM pass) and stores them for next time. Result: 146 tokens of surgical context instead of 80K tokens of degrading conversation history. New session? The model still knows your entire project's decisions.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More