Streaming Tokens and Tools: Multi-Turn Agentic Harness Support in NVIDIA Dynamo

1 / 6

Streaming Tokens and Tools: Multi-Turn Agentic Harness Support in NVIDIA Dynamo

NVIDIA Technical Blog·Matej Kosec·25 days ago

#wYVeq2y5

#x2d #agenticaigenerativeai #datacentercloud #developertoolstechniques #general #tool

Reading 0:00

15s threshold

An agentic exchange must preserve a structured interaction: assistant turns interleave reasoning with one or more tool calls, and subsequent user turns return the corresponding tool results to the model context. Reasoning replay is model- and turn-dependent: some reasoning should be retained, while some should be dropped. The inference engine is responsible for supporting this more expressive interaction model and for producing correctly segmented API results. Tool-call parsing and reasoning parsing need to happen before the attached harness consumes the response. High-value agentic workflows such as coding also depend on a responsive harness experience: reasoning segments, tool-call events, and request metadata need to stream back as the turn unfolds instead of arriving only after a final text response.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Streaming Tokens and Tools: Multi-Turn Agentic Harness Support in NVIDIA Dynamo