Handling Hash Collisions in std::unordered_map
How does std::unordered_map handle hash collisions?
std::unordered_map handles hash collisions using a technique called chaining. When multiple keys hash to the same bucket, the container stores them in a linked list or a similar data structure within that bucket. This allows multiple elements to coexist in the same bucket.
When a collision occurs, the container uses the equality comparison function (operator== or a custom comparator) to find the desired element within the bucket.
Here's an example that demonstrates hash collisions:
#include <iostream>
#include <string>
#include <unordered_map>
struct Person {
std::string name;
int age;
bool operator==(const Person& other) const {
return name == other.name && age == other.age;
}
};
namespace std {
template <>
struct hash<Person> {
size_t operator()(const Person& person) const {
return std::hash<std::string>{}(person.name);
}
};
}
int main() {
std::unordered_map<Person, int> peopleMap;
peopleMap[Person{"Alice", 25}] = 1;
peopleMap[Person{"Bob", 30}] = 2;
// Collision with "Alice" key
peopleMap[Person{"Alice", 30}] = 3;
for (const auto& [person, id] : peopleMap) {
std::cout
<< "Name: " << person.name
<< ", Age: " << person.age
<< ", ID: " << id << '\n';
}
std::cout << "Bucket count: "
<< peopleMap.bucket_count() << '\n';
std::cout << "Load factor: "
<< peopleMap.load_factor() << '\n';
}Name: Alice, Age: 30, ID: 3
Name: Alice, Age: 25, ID: 1
Name: Bob, Age: 30, ID: 2
Bucket count: 8
Load factor: 0.375In this example, the Person type uses only the name member for hashing. As a result, Person{"Alice", 25} and Person{"Alice", 30} collide because they have the same name.
The container resolves the collision by storing both elements in the same bucket. When accessing elements, it uses the equality comparison function to distinguish between the colliding elements.
The bucket_count() function returns the number of buckets in the container, and load_factor() returns the average number of elements per bucket. A higher load factor indicates more collisions.
To minimize collisions and improve performance, you can:
- Use a good hash function that distributes keys evenly across buckets.
- Increase the number of buckets by calling
rehash()with a higher bucket count. - Adjust the maximum load factor using
max_load_factor()to trigger automatic rehashing when the load factor exceeds a certain threshold.
// Increase bucket count to 10
peopleMap.rehash(10);
// Set maximum load factor to 0.75
peopleMap.max_load_factor(0.75);By understanding how std::unordered_map handles collisions using chaining and taking steps to minimize collisions, you can optimize the performance of your unordered map and ensure efficient key-value lookups.
Hash Maps using std::unordered_map
Creating hash maps using the standard library's std::unordered_map container