Menu

Post image 1
Post image 2
1 / 2
0

Live chain-of-thought in a chatbot: how to actually stream the tool calls (not just the text)

DEV Community: fastapi·Bernard Uriza·3 days ago
#uMnDCzZ9
#dev#event#stream#fullscreen#article#ama
Reading 0:00
15s threshold

Most "streaming" LLM chatbots stream just the text. The model says "I'll search for that…" and then you wait 6 seconds while the tokens dribble in. The actual search? Hidden. The 3 scrapes it did to fact-check? Hidden. You're staring at a typing indicator that doesn't tell you anything about what's actually taking time. I just built a chatbot where every tool call surfaces as a step in real time — 🔍 search_engine , 📄 scrape_as_markdown , 📄 scrape_as_markdown — while the response streams token by token afterwards. The user sees the agent's chain-of-thought as it happens, not as a postmortem. The trick is that you have to stream three different things, and each layer needs to know what to do with each kind of event. Here's the architecture. The shape of the stream The agent runner (in my case, fi-runner wrapping the Claude Agent SDK) emits events of three types as they happen: async for event in runner .…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More