Subtitle: In context is not the same as in control. By Beamlak Adane A colleague asked a sharp question that many of us hit once we move from demos to production evaluators and agents: In multi-turn evaluation or agent loops, models often begin to ignore the initial rules in the system prompt even though those tokens are still inside the context window. What is happening at the token level during decoding that causes early “anchor” tokens to lose influence over time in a streaming context, and how do attention sinks, KV-cache reuse, and prefix caching affect this? Beyond increasing context length, what can engineers do to preserve instruction fidelity and judgment consistency across long sessions? This post is my answer after digging into the mechanism and mapping it to something concrete I ship: a sales-email evaluator loop. The core idea: visible is not the same as influential Keeping the system prompt inside the window is a storage guarantee, not an influence guarantee.…