std::unordered_set
std::unordered_set
, from basic initialization to handling custom types and collisionsIn this lesson, we’ll cover the C++ Standard Library’s implementation of a hash set, which is the std::unordered_set
.
In the first part of the lesson, we'll explore the basics of std::unordered_set
, including its creation and initialization. You'll learn how this container leverages hashing to store elements efficiently, ensuring quick access and manipulation.
As we progress, we'll delve into more advanced topics such as customizing hash functions, customizing the load factor, and more.
This lesson builds on the theory we established in the previous section, which may be helpful to review for those less familiar with hashing, and how it can be used to create data structures:
The std::unordered_set
class is available once the <unordered_set>
header is included. They can be constructed in the same way as other containers - we specify the type of data they will contain as a template parameter:
#include <unordered_set>
int main() {
std::unordered_set<int> MySet;
}
The container can be populated with initial values:
#include <unordered_set>
int main() {
std::unordered_set<int> MySet { 1, 2, 3 };
}
When providing initial values, the type of those values can inferred using class template argument deduction (CTAD), so we do not need to specify it:
#include <unordered_set>
int main() {
std::unordered_set MySet { 1, 2, 3 };
}
std::unordered_set
using iteratorsThe std::unordered_set
constructor has an overload that allows us to pass an iterator pair. This will initialize our container with the values contained within that range:
#include <vector>
#include <unordered_set>
int main() {
std::vector MyVec{1, 2, 2, 2, 3};
std::unordered_set MySet{
MyVec.begin(), MyVec.end()};
}
Remember, a std::unordered_set
cannot contain duplicates, so it may be initialized with fewer values than the original range.
In this case, MySet
will have only three objects - 1
, 2
, and 3
. The additional duplicate 2
s are omitted.
Unordered sets support iteration but, as the name suggests, the containers are not ordered. This means that when we iterate over the container, we cannot easily predict our objects'Â order.
If we want to iterate over them anyway, they implement begin()
and end()
methods which return iterators. As such, we can use them with range and iterator-based functions and algorithms.
For example, they’re compatible with range-based loops:
#include <unordered_set>
#include <iostream>
int main() {
std::unordered_set MySet { 1, 2, 3 };
for (int x : MySet) {
std::cout << x << ", ";
}
}
1, 2, 3,
Below, we call the iterator-based std::vector
constructor to create an array from the contents of an unordered set:
#include <unordered_set>
#include <vector>
#include <iostream>
int main() {
std::unordered_set MySet { 1, 2, 3 };
std::vector<int> MyVec{
MySet.begin(), MySet.end()};
for (int x : MyVec) {
std::cout << x << ", ";
}
}
1, 2, 3,
Here, we use the iterator-based std::for_each()
algorithm with an unordered set:
#include <unordered_set>
#include <algorithm>
#include <iostream>
void Log(int x) { std::cout << x << ", "; }
int main() {
std::unordered_set Set { 1, 2, 3 };
std::for_each(Set.begin(), Set.end(), Log);
}
1, 2, 3,
We can add new data to our container in a few different ways:
emplace()
The preferred way to add a single object to our set is to construct it in place, if possible. We do this using the emplace()
. method. Any arguments we provide to emplace()
will be forwarded to the constructor of the underlying type. In this case, that is int
:
#include <unordered_set>
#include <iostream>
int main() {
std::unordered_set<std::string> Set;
Set.emplace("Hello");
for (const std::string& S : Set) {
std::cout << S;
}
}
Hello
insert()
Emplacing an object directly into the container is more performant than creating it outside and then moving it in. But, if our object already exists, we can copy (or move) it into the set using insert()
:
#include <iostream>
#include <unordered_set>
int main() {
std::unordered_set<std::string> Set;
std::string Greeting{"Hello"};
Set.insert(Greeting);
std::string Target{"Everyone"};
Set.insert(std::move(Target));
for (const std::string& S : Set) {
std::cout << S << ' ';
}
}
Hello Everyone
We can insert()
a list of objects in a single call:
#include <iostream>
#include <unordered_set>
int main() {
std::unordered_set<std::string> Set;
Set.insert({"Hello", "Everyone"});
for (const std::string& S : Set) {
std::cout << S << ' ';
}
}
Hello Everyone
We can also pass an iterator pair to insert()
to add multiple elements from some other container:
#include <iostream>
#include <unordered_set>
int main() {
std::unordered_set Set{1, 2, 3};
std::vector Vec{4, 5, 6};
Set.insert(Vec.begin(), Vec.end());
for (int x : Set) {
std::cout << x << ", ";
}
}
1, 2, 3, 4, 5, 6,
insert_range()
(C++23)Note: the insert_range()
method was added in C++23, and may not yet be supported by all compilers.
C++20 introduced the concept of a range. We’ll cover ranges in more detail later in the course, but for now, we can think of them as a way of representing an iterator pair as a single object.
Most of the sequential containers we’ve seen so far, such as std::vector
and std::list
are valid ranges. As such, we can insert the contents of such a container into our set using std::insert_range()
:
#include <iostream>
#include <unordered_set>
int main() {
std::unordered_set Set{1, 2, 3};
std::vector Vec{4, 5, 6};
Set.insert_range(Vec);
for (int x : Set) {
std::cout << x << ", ";
}
}
1, 2, 3, 4, 5, 6,
The size()
method returns the number of items in a set:
#include <iostream>
#include <unordered_set>
int main() {
std::unordered_set Set{1, 2, 3};
std::cout << "Size: " << Set.size();
}
Size: 3
If we want to check if a set is empty - that is, the size is 0
, we can do that using the more descriptive empty()
 method:
#include <iostream>
#include <unordered_set>
int main() {
std::unordered_set Set{1, 2, 3};
std::cout << "The set "
<< (Set.empty() ? "is" : "is not")
<< " empty";
}
The set is not empty
We can remove items from a set using the erase()
 method:
#include <iostream>
#include <unordered_set>
int main() {
std::unordered_set Set{1, 2, 3};
Set.erase(1);
for (int x : Set) {
std::cout << x << ", ";
}
}
2, 3,
We don’t need to check if the container contains an object before trying to erase it - we safely call erase()
and it will simply have no effect if the object wasn’t in our set:
#include <iostream>
#include <unordered_set>
int main() {
std::unordered_set Set{1, 2, 3};
Set.erase(4);
for (int x : Set) {
std::cout << x << ", ";
}
}
1, 2, 3,
clear()
We can remove all the items from a set using clear()
:
#include <iostream>
#include <unordered_set>
int main() {
std::unordered_set Set{1, 2, 3};
std::cout << "Size: " << Set.size();
Set.clear();
std::cout << "\nSize: " << Set.size();
}
Size: 3
Size: 0
There are several ways we search for objects within our std::unordered_set
. The most common are:
contains()
methodThe contains()
method is the simplest and most common option. It will return true
if the object we pass as an argument is in the set, and false
 otherwise:
#include <iostream>
#include <unordered_set>
int main() {
std::unordered_set Set{1, 2, 3};
if (Set.contains(2)) {
std::cout << "2 is in the set";
}
}
2 is in the set
count()
The contains()
method was only added in C++20. When targeting older specifications, we would use the count()
method, which returns the number of matching objects that were found.
Given a std::unordered_set
cannot contain duplicates, that return value would be either 0
or 1
. Typically we’d just coerce that return value to a boolean, meaning count()
is used in the same way as contains()
:
#include <iostream>
#include <unordered_set>
int main() {
std::unordered_set Set{1, 2, 3};
if (Set.count(2)) {
std::cout << "2 is in the set";
}
}
2 is in the set
insert()
return valueOften, we will want to check if our set contains an object, and insert it if it doesn’t.
We can do this using the basic variation of the insert()
method, which accepts a single value we want to insert. This function returns a struct with the result of the insertion.
It specifically returns a std::pair
- a simple container that we cover in detail in the next lesson. The pair contains two member variables:
first
: An iterator to where the object we tried to insert is within the setsecond
: A bool
representing whether the insertion happenedIn other words:
second
is false
, that means the insertion didn’t happen, because the object was already in the setsecond
is true
, that means the object wasn’t previously in the set, but our insert()
call has now added it:#include <iostream>
#include <unordered_set>
int main() {
std::unordered_set Set{1, 2, 3};
auto [it, wasInserted]{Set.insert(4)};
if (wasInserted) {
std::cout << "4 wasn't in the set";
}
if (Set.contains(4)) {
std::cout << ", but now it is";
}
}
4 wasn't in the set, but now it is
erase()
return valueAnother common requirement we’ll have is the desire to check if our set contains an object, and erase that object if it does.
The erase()
method can let us implement this, as it returns an integer representing the number of items that it removed. Given an unordered set cannot contain duplicates, erasing an object by value will therefore return 1
if our set contained the value, or 0
 otherwise:
#include <iostream>
#include <unordered_set>
int main() {
std::unordered_set Set{1, 2, 3};
if (Set.erase(3)) {
std::cout << "3 was in the set";
}
if (!Set.contains(3)) {
std::cout << ", but not any more";
}
}
3 was in the set, but not any more
find()
to generate an iteratorWe can also search our set using the find()
method. This will return an iterator to the object if it was found, or an iterator equal to the end()
return value otherwise:
#include <iostream>
#include <unordered_set>
int main() {
std::unordered_set Set{1, 2, 3};
auto A{Set.find(3)};
if (A != Set.end()) {
std::cout << "3 was found: " << *A;
}
auto B{Set.find(4)};
if (B == Set.end()) {
std::cout << "\n4 was not found";
}
}
3 was found: 3
4 was not found
Modifying an object within a hash set is a little more complex than we might expect. This is because an object’s value determines where that object exists within the container, via the hash function.
Modifying an object’s value in place risks leaving the object in the wrong position, thereby compromising the integrity of our container. The container implements some protections using const
to prevent us from doing this accidentally:
#include <iostream>
#include <unordered_set>
int main() {
std::unordered_set Set{1, 2, 3};
auto It{Set.find(3)};
(*It)++;
}
'It': you cannot assign to a variable that is const
An obvious workaround would be to copy the value, erase the original, modify the copy, and then insert it.
However, as of C++17, we have a more efficient approach to this, using the extract()
method. The extract()
method transfers ownership of the node containing the object to the caller. The node is effectively removed from the set at this point:
#include <iostream>
#include <unordered_set>
int main() {
std::unordered_set Set{1, 2, 3};
auto Node{Set.extract(3)};
for (int x : Set) {
std::cout << x << ", ";
}
}
1, 2,
We can then access and update the value contained within that node, using the value()
method. Once we’re done with our changes, we transfer the node back to the container.
We do this using an overload of the insert()
method that accepts an rvalue reference to the node, and ensures it is rehashed and positioned correctly:
#include <iostream>
#include <unordered_set>
int main() {
std::unordered_set Set{1, 2, 3};
auto Node{Set.extract(3)};
Node.value() = 4;
Set.insert(std::move(Node));
for (int x : Set) {
std::cout << x << ", ";
}
}
1, 2, 4,
Unlike with arrays, we can’t just add any type of data to an unordered set. To meet the performance characteristics, our data type needs to have some capabilities:
==
operatorMany of the C++ built-in types already have this ability.
However, many types will not, and this includes custom types we create. As a result, the following will not work, because the type we are using as the key - Item
doesn’t meet either of the two requirements:
#include <unordered_set>
#include <string>
struct Item {
std::string Name;
};
int main() {
// In their current form, Item objects
// cannot be in a hash set
std::unordered_set<Item> SetA;
// References to this type will have the
// same issue
std::unordered_set<Item&> SetB;
// Pointers to any type can be hashed and
// compared, so this would work
std::unordered_set<Item*> SetC;
}
There are two ways we can enable this. The first option is to implement std::hash
and operator==()
for our type. We walked through the implementation of this in our previous lesson:
// User.h
#pragma once
#include <string>
struct User {
std::string Email;
bool operator==(const User& Other) const {
return Email == Other.Email;
}
};
namespace std {
template <>
struct hash<User> {
size_t operator()(const User& U) const {
return std::hash<std::string>{}(U.Email);
}
};
}
Alternatively, we can provide a hash function and an equality function that our std::unordered_set
can use We cover these in the next two sections.
As we saw above, std::unordered_set
uses std::hash
as its default hash function. However, when creating our set, we have the option of providing a custom hash function.
This allows us to create sets for storing a type that does not implement std::hash
.
We provide the hash function as the second template parameter. Specifically, we need to provide a class that overloads the ()
 operator:
#include <unordered_set>
#include <string>
#include <iostream>
struct User {/*...*/}
struct Hasher {
size_t operator()(const User& U) const {
return std::hash<std::string>{}(U.Name);
}
};
int main() {
std::unordered_set<User, Hasher> Set;
Set.emplace("Bob");
if (Set.contains(User{"Bob"})) {
std::cout << "Found Bob";
}
}
Found Bob
The Hasher
syntax above may seem slightly unusual. We’re defining a type that can be used to create function objects - that is, objects that can be "called" using the ()
operator. These are sometimes referred to as functors, and we cover them in more detail in a dedicated lesson later in the course.
By default, when the std::unordered_set
container needs to determine if two objects are equal, it does that using the ==
operator. We can provide a different function for this by providing a functor type as the third template parameter.
This allows us to store types that do not overload the ==
 operator:
#include <unordered_set>
#include <string>
#include <iostream>
struct User {
std::string Name;
};
struct Hasher {/*...*/}
struct Comparer {
bool operator()(
const User& A, const User& B) const {
return A.Name == B.Name;
}
};
int main() {
std::unordered_set<User, Hasher, Comparer> Set;
Set.emplace("Bob");
if (Set.contains(User{"Bob"})) {
std::cout << "Found Bob";
}
}
Found Bob
By definition, a hash function should return the same hash for the same object.
In other words, for our hash container to work correctly, if our comparison function claims two objects are equal, our hash function should also return the same hash for those two objects.
The following set is unlikely to work correctly, as our equality function is comparing objects by their absolute value, such that -1
and 1
are considered equal.
However, the hash function we’re using (the default std::hash
in this case) is likely to return different hashes for -1
and 1
:
#include <unordered_set>
#include <iostream>
struct Comparer {
bool operator()(int a, int b) const {
return std::abs(a) == std::abs(b);
}
};
int main() {
std::unordered_set<
int, std::hash<int>, Comparer
> Set{1, 2, 3};
if (Set.contains(1)) {
std::cout << "The set contains 1";
}
if (!Set.contains(-1)) {
std::cout << " but not std::abs(-1)?";
}
}
The set contains 1 but not std::abs(-1)?
We could fix this previous example by providing a hash function that is based on the absolute value, too:
#include <unordered_set>
#include <string>
#include <iostream>
struct Hasher {
size_t operator()(int x) const {
return std::hash<int>{}(std::abs(x));
}
};
struct Comparer {/*...*/}
int main() {
std::unordered_set<int, Hasher, Comparer>
Set{1, 2, 3};
if (Set.contains(-1)) {
std::cout << "Set contains std::abs(-1)";
}
}
Set contains std::abs(-1)
The C++ specification doesn’t state what collision resolution strategy should be used for std::unordered_set
.
However, it does prescribe an API that is much easier to implement using secondary chaining, so this is the approach that compilers tend to take.
The specification refers to these secondary chains as buckets.
We can determine how many buckets our set is using with the bucket_count()
 method:
#include <iostream>
#include <unordered_set>
int main() {
std::unordered_set Nums{1, 2, 3};
std::cout << "Bucket Count: "
<< Nums.bucket_count();
}
Bucket Count: 8
Behind the scenes, each bucket is just an index of the underlying array. At each index, we have the secondary chain - often a linked list - of objects that hash to that index.
If we insert more items into the unordered set, it will eventually decide to rehash. That is, resizing the underlying array to provide more buckets:
#include <iostream>
#include <unordered_set>
int main() {
std::unordered_set Nums{1, 2, 3, 4, 5};
std::cout << "Bucket Count: "
<< Nums.bucket_count();
Nums.emplace(6);
Nums.emplace(7);
Nums.emplace(8);
Nums.emplace(9);
std::cout << "\nBucket Count: "
<< Nums.bucket_count();
}
Bucket Count: 8
Bucket Count: 64
For a std::unordered_set
specifically, rehashing will happen once the container exceeds its maximum load factor.
The load factor of a hash-based container is equivalent to the number of elements it contains, divided by the number of buckets it uses.
This is made available through the load_factor()
 method.
In the following example, our unordered set is using 8
buckets to store a combined 4
objects, meaning its load factor is 0.5
:
#include <iostream>
#include <unordered_set>
int main() {
std::unordered_set Nums{1, 2, 3, 4};
std::cout << "Buckets: "
<< Nums.bucket_count();
std::cout << "\nSize: "
<< Nums.size();
std::cout << "\nLoad Factor: "
<< Nums.load_factor();
}
Buckets: 8
Size: 4
Load Factor: 0.5
The load factor of a hash container is the main heuristic we use to balance memory usage vs collisions:
The std::unordered_set
container lets us specify the maximum load factor using the max_load_factor()
 method.
Any insertion that would cause our set’s load factor to go over this value will cause a rehash:
#include <iostream>
#include <unordered_set>
int main() {
std::unordered_set Nums{1, 2, 3, 4};
std::cout << "Bucket Count: "
<< Nums.bucket_count()
<< "\nLoad Factor: "
<< Nums.load_factor();
Nums.max_load_factor(0.25);
Nums.insert(5);
std::cout << "\n\nBucket Count: "
<< Nums.bucket_count()
<< "\nLoad Factor: "
<< Nums.load_factor();
}
Bucket Count: 8
Load Factor: 0.5
Bucket Count: 64
Load Factor: 0.078125
A variation of max_load_factor()
that takes no arguments is available. This overload currently configured maximum load factor.
By default, this is set to 1.0
for a std::unordered_set
:
#include <iostream>
#include <unordered_set>
int main() {
std::unordered_set Nums{1, 2, 3, 4};
std::cout << "Max Load Factor: "
<< Nums.max_load_factor();
}
Max Load Factor: 1
If we have an approximate idea of how many objects our set is likely going to need to store, it can be helpful to allocate enough space up-front.
This can reduce the amount of rehashing that needs to occur during the lifecycle of our container.
rehash()
FunctionThe rehash()
method lets us manually rehash the container. We pass the number of buckets we want as an argument, and our container will then be rehashed to use at least that many buckets:
#include <iostream>
#include <unordered_set>
int main() {
std::unordered_set<int> Nums;
std::cout << "Bucket Count: "
<< Nums.bucket_count();
Nums.rehash(500);
std::cout << "\nBucket Count: "
<< Nums.bucket_count();
}
Bucket Count: 8
Bucket Count: 512
reserve()
MethodThe reserve()
method is very similar to rehash()
, except the argument we provide is the number of items we expect our container to have.
The key difference is that the number of buckets reserve()
will rehash to consider our max_load_factor()
.
Below, we reserve space for 500 items. Given our max load factor is 0.25, we therefore allocate space for at least 2,000Â buckets:
#include <iostream>
#include <unordered_set>
int main() {
std::unordered_set<int> Nums;
Nums.max_load_factor(0.25);
Nums.reserve(500);
std::cout << "Bucket Count: "
<< Nums.bucket_count();
}
Bucket Count: 2048
In this lesson, we have explored std::unordered_set
- the C++ standard library’s implementation of a hash set. We covered everything from initialization and iteration to customization for complex types.
std::unordered_set
with various methods, including class template argument deduction (CTAD).std::unordered_set
, emphasizing its unordered nature.emplace()
, insert()
, and C++23's insert_range()
methods.size()
of a set, and whether the set is empty()
.erase()
and clearing the entire set with clear()
.contains()
, count()
, insert()
return value, erase()
return value, and find()
.extract()
method.std::unordered_set
, including the implementation of hash functions and equality operators.std::unordered_set
.std::unordered_set
.rehash()
and reserve()
for optimizing space and reducing collisions.Â
std::unordered_set
This lesson provides a thorough understanding of std::unordered_set
, from basic initialization to handling custom types and collisions
Comprehensive course covering advanced concepts, and how to use them on large-scale projects.