Menu

Post image 1
Post image 2
Post image 3
Post image 4
Post image 5
Post image 6
1 / 6
0

Streaming Tokens and Tools: Multi-Turn Agentic Harness Support in NVIDIA Dynamo

NVIDIA Technical Blog·Matej Kosec·25 days ago
#wYVeq2y5
Reading 0:00
15s threshold

An agentic exchange must preserve a structured interaction: assistant turns interleave reasoning with one or more tool calls, and subsequent user turns return the corresponding tool results to the model context. Reasoning replay is model- and turn-dependent: some reasoning should be retained, while some should be dropped. The inference engine is responsible for supporting this more expressive interaction model and for producing correctly segmented API results. Tool-call parsing and reasoning parsing need to happen before the attached harness consumes the response. High-value agentic workflows such as coding also depend on a responsive harness experience: reasoning segments, tool-call events, and request metadata need to stream back as the turn unfolds instead of arriving only after a final text response.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More