* > Part 2 of the Agent Reliability Series — Part 1 covered state persistence and conditional branching * I got an email from Google last week. Not unusual. But this one landed differently. Cloud Observability was telling me that starting June 1, 2026, a new endpoint — telemetry.googleapis.com — would automatically activate on any Google Cloud project that already had Cloud Logging, Cloud Trace, or Cloud Monitoring running. No action required. No migration. Just suddenly, a native OpenTelemetry ingestion pipeline sitting live in my project, waiting. I've been writing about agent reliability. State persistence. Crash recovery. Conditional branching that doesn't silently loop forever. And the whole time I was doing that, I had been running LangGraph pipelines with zero observability. No traces. No spans. No way to answer the single most important question when something goes wrong at 3am: which node failed, what state did it have when it failed, and how many times had it already tried?…