The expensive moments in a coding-agent session are not the model's tokens. They are the seconds you spend skimming a markdown plan and missing a subtle misalignment. You approve, then watch the implementer solve a slightly different problem than the one in your head. We have started treating that gap as a UI problem, not a model problem. And the UI we have, for coding agents specifically, is bad. Thariq Shihipar at Claude Code has been making this case publicly for a while: agents should be emitting HTML, not markdown, for most non-trivial output. His thread is the right primer on why, and we're not going to try to re-derive it here. What we want to add is the piece that has been missing for us. We needed a way to use HTML at every plan stage without the token cost stacking up across the session. That way is a screenshot, borrowed from how DeepSeek-OCR handles context compression. Thariq's article. The case Thariq makes, in three parts. We will not reproduce Thariq's thread in full. We suggest reading it.…