I built a Vamana-based vector search engine in C++ called sembed-engine . Recently I made a pull request that sped up queries by 16x and builds by 9x. The algorithm stayed exactly the same. The recall stayed at 1.0. The number of visited nodes did not change. The speedup came from data layout. The old design The original code stored vectors as separate objects pointed to by shared_ptr : struct Record { int64_t id ; std :: shared_ptr < Vector > vector ; }; Enter fullscreen mode Exit fullscreen mode This is clean C++. Every record has an id and a vector. The vector knows how to calculate distance. In the hot path, though, the CPU had to load the record, read the shared_ptr , follow the pointer, call virtual methods, and read each float through an abstraction layer. Millions of times per query. The new layout I replaced the object graph with a flat array.…