So far, we’ve been converting the data we want to serialize into a string-based form. For example, when we’ve serialized a number like 42
, we’ve converted it into a std::string
. Then, when we deserialize that data back into memory, we convert it back to its original int
form.
This is a useful technique and is applicable to many scenarios, but these conversions incur a performance cost. In situations where performance matters, we’d prefer to avoid this cost and use binary serialization instead.
Binary serialization creates data that represents our objects in the exact same way that type is represented in memory. To fully understand this, we first need to familiarise ourselves with some important concepts that affect how data is stored in memory.
When we write our code, it’s often a good idea to ensure our approach is as easy as possible to adapt to other contexts with minimal changes, or ideally no changes at all. This is sometimes referred to as porting the program, and programs that are easy to port are called portable.
Portability is particularly important to consider when we’re serializing data. For example:
Throughout the rest of this chapter, we’ll cover the main things we need to consider when creating portable serialization systems. These topics are also useful when working with low-level system programming more generally, so will help us build a solid foundation.
One technique that’s immediately worth being aware of is that our serialized data can also include additional metadata. This metadata can include useful information like describing the nature of the data when it was created, what version of our software created it, and more.
This metadata can help us or other systems understand what they need to do to accurately deserialize it. For example, almost all data on the internet is transferred using HTTP - HyperText Transfer Protocol.
When your browser downloads a .png
image for display on a website, the serialized HTTP payload might look something like this:
HTTP/1.1 200 OK
Date: Tue, 21 Jan 2025 14:15:16 GMT
Last-Modified: Tue, 01 Dec 2009 20:18:22 GMT
Content-Length: 2171
Content-Type: image/png
base64,iVBORw0KGgoRO0KwgGogWiVw0BgRO (...continues)
Only the last line of this serialized data is the .png
image. This primary data is called the body of the payload, but HTTP allows us to provide additional metadata above the body. This metadata helps the browser understand the data and provides some ancillary information that may be useful.
Each piece of metadata is called a header, and each header is separated by a line break. The headers are then collectively separated from the body by two line breaks.
When the serialization needs of our application get complicated, it can be helpful to adopt techniques like this, tailored to the requirements and challenges of our use case.
To understand how computers represent numeric data in memory, we need to understand the concept of a numeric base. The three most important numeric bases in computing are decimal, binary, and hexadecimal.
Decimal is the numeric system we’re most familiar with - it uses a numeric base of 10
. This is because, in decimal, we represent a number using a sequence of digits, and each digit has one of ten possible values, from 0
to 9
.
For numbers greater than 9
, we add additional digits to the left. With a decimal number like 125
, we can determine the impact of each digit based on its position. For example, the 1
is in the "hundreds" position, the 2
is in the "tens" position, and the 5
is in the "ones" position.
As such, we can reconstruct this combination of 1
, 2
, and 5
to give the 125
value it represents like this:
We can generalize our approach to calculating the value associated with each digit by multiplying its value by the numeric base raised by an exponent based on the digit’s position. Our exponent starts at for the least significant (rightmost) digit, and increases by as we take each step left. Note that and, in general, for any value of .
Using this approach, our equation for using the numeric base would look like this::
In mathematical notation, where the base of a number is relevant to what we’re trying to communicate, we typically provide it after the number using subscript. We can also add brackets if preferred for clarity. For example:
By default, the numeric literals we use in C++ are assumed to be base 10 and, when we print those values, they are also displayed in base 10:
#include <iostream>
int main(){
int Value{125};
std::cout << Value;
}
125
Binary systems use a numeric base of 2
. The two digits that it uses are 0
and 1
.
Computers use binary because it is a simple, reliable system that represents data and instructions using two states, which align perfectly with the on (1
) and off (0
) states of electronic circuits.
The base 10 value 125
corresponds to the binary value 1111101
. This value has 7 binary digits (bits), whilst a byte of computer memory comprises 8 bits. If we want to represent this value is a byte, we can add a leading zero, as in 01111101
.
We can convert this binary sequence back to decimal as follows:
In C++, we can provide a value in binary by prefixing it with 0b
:
#include <iostream>
int main(){
int Value{0b1111101};
std::cout << "Value: " << Value;
}
Value: 125
To view the binary representation of a value, we can use std::format()
(C++20) or std::print()
(C++23) using the :b
formatting string:
#include <iostream>
#include <format>
int main(){
int Value{125};
std::cout << std::format("{:b}", Value);
}
1111101
We cover std::format()
and std::print()
in more detail here:
Finally, let’s take a look at hexadecimal, which is base 16. The 16 digits include the 10 decimal digits 0-9
in addition to the first 6 alphabetic characters a-f
, or A-F
if we prefer uppercase.
The a
, b
, c
, d
, e
, and f
characters correspond to the decimal values 10
, 11
, 12
, 13
, 14
, and 15
respectively.
The decimal value 125
is 7d
in hexadecimal:
In C++, we can provide a value in hexadecimal by prefixing it with 0x
:
#include <iostream>
int main(int argc, char** argv){
int Value{0x7d};
std::cout << "Value: " << Value;
}
Value: 125
To view the hexadecimal representation of a value, we can use std::format()
(C++20) or std::print()
(C++23) using the :x
format string, or :X
if we want the a-f
digits output in uppercase:
#include <iostream>
#include <format>
int main(){
int Value{125};
std::cout << std::format("{:x}", Value);
}
7d
Hexadecimal is frequently used in programming because it provides a compact way to represent a byte (8 bits) of data. A byte can store one of 256 possible values.
0
to 255
) requires up to three digits, but not all combinations of three digits are valid. For example, 525
is typically too large to be stored in a single byte.0
to 11111111
) requires up to 8 digits and all combinations are valid, but a sequence of 8 binary digits is quite long and can be difficult to read.0
to ff
) requires only 2 digits, and every combination is valid.The first thing we need to be mindful of when serializing values from memory, or deserializing objects to memory, is how much memory that type uses. The number of binary digits (bits) that a type uses is often referred to as its width.
Surprisingly, the C++ standard doesn’t specify what the width should be of some important types, including the basic int
and unsigned int
. On some platforms, these types may be 4 bytes (32 bits) whilst, on others, they may be 8 bytes (64 bits).
If we want to serialize data in a way that is portable despite these inconsistencies, we need a strategy to deal with these differences.
For now, we’re focused on types that have a static width - that is, their width does not depend on the value they’re storing.
Types like std::string
and std::vector
have dynamic width - that is, their size in memory depends on how many characters or other objects they’re storing. This brings their own serialization challenges, which we’ll cover later.
For now, we’ll focus on types that have a static width. That is, their size is known at compile time, and does not depend on the value they’re storing.
We saw that a number like 125
can be represented by a single byte of 8 binary digits:
However, the width of a statically-sized type like an unsigned int
does not change based on the value it is storing. If our value doesn’t require the full width offered by the type, the excess digits can be set to 0
.
So, just like the integer 125
can be represented in decimal as 00125
, that same value stored in a type with a width of 32 bits (4 bytes) can be represented like this:
00000000 00000000 00000000 01111101
For simplicity, we’re assuming the integer type we’re using is for unsigned values - that is, values that cannot be negative. The same principles around width apply to any integer type, including signed integers like int
, but how signed types use their bits is different.
Computers typically represent signed integers using the two’s complement method. We don’t cover it in this course, but there are plenty of resources documenting this representation.
We can find the width of a type using the sizeof
operator:
#include <iostream>
int main() {
std::cout << "Size of int: " << sizeof(int)
<< " bytes, " << sizeof(int) * 8 << " bits\n";
std::cout << "Size of float: " << sizeof(float)
<< " bytes, " << sizeof(float) * 8 << " bits\n";
std::cout << "Size of double: " << sizeof(double)
<< " bytes, " << sizeof(double) * 8 << " bits\n";
}
Size of int: 4 bytes, 32 bits
Size of float: 4 bytes, 32 bits
Size of double: 8 bytes, 64 bits
When we need to be explicit about how wide we need our integer to be, we can use an integral type that has an explicit size. The C++ standard library includes implementations of such types, available within the <cstdint>
header.
For example, to create an integer that is guaranteed to have a width of 32 bytes, we can use the int32_t
type:
#include <cstdint>
int32_t Value{125};
Similar types are available for a variety of widths, in both signed and unsigned variations:
#include <cstdint>
// Signed Integers
int8_t A{1};
int16_t B{2};
int32_t C{3};
int64_t D{4};
// Unsigned Integers
uint8_t E{5};
uint16_t F{6};
uint32_t G{7};
uint64_t H{8};
Given the unspecified width of the int
type, it may be tempting to just stop using it and switch to int32_t
in all scenarios. This is a reasonable decision, but more commonly, developers tend to stick to int
unless they have some specific reason to be explicit about the size.
These reasons include portability considerations, the need to support larger values, or the need to reduce memory usage. In those situations, it is highly recommended to use fixed-width integer types over alternative built-in integer types like short
or long
.
The projects or teams you’re working on will likely have some standard guidance on which integer types to use. Google’s C++ style guide says the following:
Of the built-in C++ integer types, the only one used is int
. If a program needs an integer type of a different size, use an exact-width integer type from <cstdint>
, such as int16_t
.
The sizes of integral types in C++ can vary based on compiler and architecture.
The standard library header <cstdint>
defines types like int16_t
, uint32_t
, int64_t
, etc. You should always use those in preference to short
, unsigned long long
and the like, when you need a guarantee on the size of an integer.
Of the built-in integer types, only int
should be used. We use int
very often, for integers we know are not going to be too big, e.g., loop counters. Use plain old int
for such things. You should assume that an int
is at least 32 bits, but don't assume that it has more than 32 bits. If you need a 64-bit integer type, use int64_t
or uint64_t
.
When we’re using SDL and including the <SDL.h>
headers, we can use aliases for these fixed-width integer types.
Signed integers are available using the Sint8
, Sint16
, Sint32
, and Sint64
aliases. Unsigned integers are available as Uint8
, Uint16
, Uint32
, and Uint64
.
#include <SDL.h>
// Signed Integers
Sint8 A{1};
Sint16 B{2};
Sint32 C{3};
Sint64 D{4};
// Unsigned Integers
Uint8 E{5};
Uint16 F{6};
Uint32 G{7};
Uint64 H{8};
Similar to integers, the C++ specification also doesn’t define exact widths for floating-point types like float
and double
. However, in practice, platforms are much more consistent in their widths. The float
type is almost always 32 bits, whilst double
uses 64.
Platforms also tend to be consistent in how they use those bits by following the IEEE-754 standard. Those interested can find more information on the Wikipedia page.
Despite the relative consistency across implementations, we may still prefer to be explicit with the width of our floating point types.
To help us with this, the C++23 specification included standard types with explicit 16, 32, 64, and 128 bit fixed widths. These are available by including the <stdfloat>
header:
#include <stdfloat>
std::float16_t A{1.0};
std::float32_t B{2.0};
std::float64_t C{3.0};
std::float126_t D{4.0};
Note that this is a relatively new addition to the language and may not be available in your project. As of 2025, these types still aren’t widely implemented by compilers.
Because the basic float
and double
types are implemented somewhat consistently across platforms, it’s fairly uncommon that fixed-width floats are used even on projects where they are available.
As usual, teams and projects will typically have agreed standards around which types to use. Google’s C++ style guide recommends the following:
Of the built-in C++ floating-point types, the only ones used are float
and double
. You may assume that these types represent IEEE-754 binary32 and binary64, respectively.
Do not use long double
, as it gives non-portable results.
While integral types like uint8_t
and int32_t
are designed to store integer values, they are also frequently used to represent other kinds of data.
Remember that, at their core, computers store everything as sequences of bits. Integral types, especially unsigned ones, provide a way to work with arbitrary binary data.
uint32_t
A common example is representing colors. A color can be defined by the intensity of its red, green, and blue components. Each component's intensity is often represented by a number between 0 and 255, which fits perfectly within a single byte (8 bits).
By combining four bytes, we can represent the red, green, blue, and alpha (transparency) values of a color. The uint32_t
type has 4 bytes, so we could use this to represent a color.
Let's create a color with maximum red (255), no green (0), no blue (0), and maximum alpha (255). We can use hexadecimal notation to make this more readable, as each byte is represented by exactly two hexadecimal digits:
// Red: FF, Green: 00, Blue: 00, Alpha: FF
uint32_t redColor{0xFF0000FF};
Similarly, an arbitrary byte of data can be represented by a uint8_t
, and we can represent larger blobs of binary data by using an array of such values:
#include <array>
#include <cstdint>
#include <vector>
int main() {
// One byte
uint8_t A{0b01010101};
// 16 bytes - c-style array
uint8_t B[16];
B[0] = 0b01010101;
// 16 bytes - std::array
std::array<uint8_t, 16> C;
C[0] = 0b01010101;
// Resizable array of bytes
std::vector<uint8_t> D;
D.emplace_back(0b01010101);
}
std::byte
and std::bitset
TypesIt’s a little unusual to represent non-numeric data using a numeric type. Because of this, the standard library includes the std::byte
and std::bitset
types, to make it more explicit that we’re representing arbitrary binary data:
#include <utility> // for std::byte
#include <bitset>
// 1 byte (8 bits)
std::byte A{0b01010101};
// 4 bytes (32 bits)
std::bitset<32> B{0xFF0000FF};
These types are intentionally more restrictive than their numeric counterparts like uint8_t
. For example, it rarely makes sense to multiply non-numeric binary data, so std::byte
contains no such operator:
#include <cstdint>
#include <utility> // for std::byte
int main() {
uint8_t A{0b01010101};
A *= 2; // This is allowed
std::byte B{0b01010101};
B *= 2; // This isn't
}
error: binary '*=': 'std::byte' does not define this operator or a conversion to a type acceptable to the predefined operator
However, unsigned integer types are still the most common way of representing binary data, even when what we’re storing isn’t really intended to be used as an integer.
This lesson explored how C++ represents data in memory, covering different numeric bases like binary, decimal, and hexadecimal. We also learned about the importance of data width, fixed-width integer types, and how integral types can be used to represent non-numeric data like colors and arbitrary binary sequences.
Key Takeaways:
int32_t
and uint8_t
guarantee a specific size.sizeof
operator tells you the size of a type in bytes. Multiply by 8 to get the size in bits.std::byte
is designed for raw data, unsigned integral types are still commonly used for historical and practical reasons.Learn how C++ represents numbers and data in memory using binary, decimal, and hexadecimal systems.
Learn C++ and SDL development by creating hands on, practical projects inspired by classic retro games