Why Your Binary Protocol Should Care About CPU Cache Lines

📰

Why Your Binary Protocol Should Care About CPU Cache Lines

DEV Community·speed engineer·about 1 month ago

#why #performance #networking #cache #line #bytes

Reading 0:00

15s threshold

Why Your Binary Protocol Should Care About CPU Cache Lines If you've ever designed a custom binary protocol for a hot path — a game server, a market-data feed, an internal RPC — you've probably obsessed over byte layout, alignment, and zero-copy parsing. There's one detail most tutorials skip that quietly costs you 2-5x throughput: cache line alignment . The 64-byte secret Modern CPUs don't read memory one byte at a time. They read in chunks called cache lines — typically 64 bytes on x86_64 and ARM. Every load that misses L1 pulls in a full cache line. Every store that has to be visible to other cores invalidates a cache line on those cores. If your protocol's "hot fields" — the bits the receiver reads first and most often — sit on the boundary between two cache lines, you just doubled your memory traffic for free. A worked example Picture a naive market-data tick struct: a uint8_t type tag, a uint64_t timestamp, a uint32_t symbol id, an 8-byte price, an 8-byte sequence number, and an 8-bit flags field.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Why Your Binary Protocol Should Care About CPU Cache Lines