Why You Should Never Use std::unordered_set in Hot C++ Loops

1 / 2

Why You Should Never Use std::unordered_set in Hot C++ Loops

DEV Community·kartikay dubey·30 days ago

#uWaaAElM

#cpp #performance #algorithms #benchmarking #unordered_set #bitset

Reading 0:00

15s threshold

Hash tables feel like the default choice for membership tests. std::unordered_set promises average O(1) lookup, so we reach for it automatically. In performance-sensitive C++ code, that habit can cost you an order of magnitude. I ran into this while building a Vamana graph index for approximate nearest neighbor search. The algorithm needs to track visited nodes. Node ids are dense integers, and the visited check runs inside the hottest loop in the entire search path. My first implementation used std::unordered_set<uint32_t> . It was correct, and it was slow. What the benchmark says I generated 1000 vectors of random uint32_t ids and deduplicated them using three approaches: std::unordered_set , sort + unique , and boost::dynamic_bitset<> . For dense ids sampled from [0, 2n) , the numbers were brutal: n unordered_set ms sort+unique ms boost bitset ms 128 5 3 1 32,768 1,649 1,455 177 500,000 50,302 26,759 3,423 At n = 500,000 , the bitset was 14.7x faster.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Why You Should Never Use std::unordered_set in Hot C++ Loops