` Your Spring Boot service returns 200 OK. Latency looks fine in Datadog. Users are complaining the answers are wrong and slow. You open the logs. Nothing useful. You check your APM traces. HTTP span: 1.2 seconds. Business logic: 40ms. That leaves 1.16 seconds completely unaccounted for — inside the LLM call, where your standard tooling sees nothing. This is the observability gap in every LLM-enabled Java service. Standard APM tools were not built to capture what actually matters: which prompt triggered which model, how many tokens it consumed, what it cost, whether the tool call chain stalled on the third retry, or which span in a multi-step RAG pipeline blew the latency budget. This post walks through a runnable setup that closes that gap: Spring AI + OpenTelemetry + self-hosted Langfuse , fully containerized, no data leaving your infrastructure. The full solution with source code, Docker Compose, and 11 execution screenshots is at exesolution.com .…