This post originally appeared on tokenjam.dev/blog . It's part of a 14-post series on the agentic AI ecosystem. TL;DR Agent observability captures what an agent did (tool calls, token costs, latency, reasoning chains) at detail sufficient to debug and audit behavior in production Traditional logs and metrics aren't enough; you need traces that record the LLM's step-by-step decisions, tool invocations, and outcomes Agents are harder to observe than services because of nondeterminism, deeply nested calls, prompts and completions as data, and vocabulary that didn't exist three years ago OpenTelemetry GenAI semantic conventions are becoming the emerging standard for agent telemetry Agent observability is the practice of capturing what an AI agent did (its tool calls, token costs, behavioral patterns, and outcomes) at a level of detail sufficient to debug, optimize, and audit agent behavior in production.…