TL;DR: We can speed up timestamps on x86 Linux by 30% and maintain the same precision as the standard system clock by implementing our own timers without relying on vDSO. Almost nobody should do this. Table of contents Timing the timers The TSC When syscalls aren’t Faster monotonic clocks Making our own vDSO Measuring tails Stable timers Conclusion Appendix: Methodology Timing the timers One of my pet projects at my last job was to introduce distributed tracing to a low-latency pipeline (think 1–10 microseconds per stage) using OpenTelemetry. As part of this effort I spent a considerable amount of time designing, implementing, and optimising our own C++ tracing client library, as the official one has too much overhead. My goal was for the latency impact per component to stay under 5% so both developers and users would feel comfortable leaving traces always on in production; this translated to a budget of about 50–100 ns (a few hundred clock cycles) per span.…