Numeric and Binary Data

Learn how C++ represents numbers and data in memory using binary, decimal, and hexadecimal systems.

This lesson is part of the course:

Game Dev with SDL2

Learn C++ and SDL development by creating hands on, practical projects inspired by classic retro games

Free, Unlimited Access

Abstract art representing computer programming

Ryan McCombe

Posted 8 hours ago

So far, we’ve been converting the data we want to serialize into a string-based form. For example, when we’ve serialized a number like 42, we’ve converted it into a std::string. Then, when we deserialize that data back into memory, we convert it back to its original int form.

This is a useful technique and is applicable to many scenarios, but these conversions incur a performance cost. In situations where performance matters, we’d prefer to avoid this cost and use binary serialization instead.

Binary serialization creates data that represents our objects in the exact same way that type is represented in memory. To fully understand this, we first need to familiarise ourselves with some important concepts that affect how data is stored in memory.

Compatibility and Portability

When we write our code, it’s often a good idea to ensure our approach is as easy as possible to adapt to other contexts with minimal changes, or ideally no changes at all. This is sometimes referred to as porting the program, and programs that are easy to port are called portable.

Portability is particularly important to consider when we’re serializing data. For example:

Data serialized from our program running on one platform, like a Windows machine with an AMD processor, may need to be read and understood by an instance of our program installed on some different platform, like a macOS machine using Apple Silicon.
The data we serialize may need to remain compatible with future versions of our program. This means we need to deal with problems like adding a variable to a class and handling data that was created on an earlier version of our program that didn’t have that variable.
Our program may serialize data that is intended to be read by a different program entirely, so we want to make it as easy as possible for those other programs to understand our data.

Throughout the rest of this chapter, we’ll cover the main things we need to consider when creating portable serialization systems. These topics are also useful when working with low-level system programming more generally, so will help us build a solid foundation.

Metadata

One technique that’s immediately worth being aware of is that our serialized data can also include additional metadata. This metadata can include useful information like describing the nature of the data when it was created, what version of our software created it, and more.

This metadata can help us or other systems understand what they need to do to accurately deserialize it. For example, almost all data on the internet is transferred using HTTP - HyperText Transfer Protocol.

When your browser downloads a .png image for display on a website, the serialized HTTP payload might look something like this:

1HTTP/1.1 200 OK
2Date: Tue, 21 Jan 2025 14:15:16 GMT
3Last-Modified: Tue, 01 Dec 2009 20:18:22 GMT
4Content-Length: 2171
5Content-Type: image/png
6
7base64,iVBORw0KGgoRO0KwgGogWiVw0BgRO (...continues)

Only the last line of this serialized data is the .png image. This primary data is called the body of the payload, but HTTP allows us to provide additional metadata above the body. This metadata helps the browser understand the data and provides some ancillary information that may be useful.

Each piece of metadata is called a header, and each header is separated by a line break. The headers are then collectively separated from the body by two line breaks.

When the serialization needs of our application get complicated, it can be helpful to adopt techniques like this, tailored to the requirements and challenges of our use case.

Numeric Bases

To understand how computers represent numeric data in memory, we need to understand the concept of a numeric base. The three most important numeric bases in computing are decimal, binary, and hexadecimal.

Decimal

Decimal is the numeric system we’re most familiar with - it uses a numeric base of 10. This is because, in decimal, we represent a number using a sequence of digits, and each digit has one of ten possible values, from 0 to 9.

For numbers greater than 9, we add additional digits to the left. With a decimal number like 125, we can determine the impact of each digit based on its position. For example, the 1 is in the "hundreds" position, the 2 is in the "tens" position, and the 5 is in the "ones" position.

As such, we can reconstruct this combination of 1, 2, and 5 to give the 125 value it represents like this:

125 = (1 \times 100) + (2 \times 10) + (5 \times 1)

We can generalize our approach to calculating the value associated with each digit by multiplying its value by the numeric base raised by an exponent based on the digit’s position. Our exponent starts at $0$ for the least significant (rightmost) digit, and increases by $1$ as we take each step left. Note that $10^0 = 1$ and, in general, $x^0 = 1$ for any value of $x$ .

Using this approach, our equation for $125$ using the numeric base $10$ would look like this::

125 = (1 \times 10^2) + (2 \times 10^1) + (5 \times 10^0)

In mathematical notation, where the base of a number is relevant to what we’re trying to communicate, we typically provide it after the number using subscript. We can also add brackets if preferred for clarity. For example:

(125)_{10} = (1 \times 10^2) + (2 \times 10^1) + (5 \times 10^0)

By default, the numeric literals we use in C++ are assumed to be base 10 and, when we print those values, they are also displayed in base 10:

1#include <iostream>
2
3int main(){
4  int Value{125};
5  std::cout << Value;
6}

Binary

Binary systems use a numeric base of 2. The two digits that it uses are 0 and 1.

Computers use binary because it is a simple, reliable system that represents data and instructions using two states, which align perfectly with the on (1) and off (0) states of electronic circuits.

The base 10 value 125 corresponds to the binary value 1111101. This value has 7 binary digits (bits), whilst a byte of computer memory comprises 8 bits. If we want to represent this value is a byte, we can add a leading zero, as in 01111101.

We can convert this binary sequence back to decimal as follows:

\begin{aligned} (01111101)_{2} = &\medspace(0 \times 2^7) + (1 \times 2^6)\medspace+\\ &\medspace(1 \times 2^5) + (1 \times 2^4)\medspace+ \\ &\medspace(1 \times 2^3) + (1 \times 2^2)\medspace+ \\ &\medspace(0 \times 2^1) + (1 \times 2^0) \\ = &\medspace(125)_{10} \end{aligned}

In C++, we can provide a value in binary by prefixing it with 0b:

1#include <iostream>
2
3int main(){
4  int Value{0b1111101};
5  std::cout << "Value: " << Value;
6}

1Value: 125

To view the binary representation of a value, we can use std::format() (C++20) or std::print() (C++23) using the :b formatting string:

1#include <iostream>
2#include <format>
3
4int main(){
5  int Value{125};
6  std::cout << std::format("{:b}", Value);
7}

11111101

We cover std::format() and std::print() in more detail here:

String Interpolation

A detailed guide to string formatting using C++20's std::format(), and C++23's std::print()

Hexadecimal

Finally, let’s take a look at hexadecimal, which is base 16. The 16 digits include the 10 decimal digits 0-9 in addition to the first 6 alphabetic characters a-f, or A-F if we prefer uppercase.

The a, b, c, d, e, and f characters correspond to the decimal values 10, 11, 12, 13, 14, and 15 respectively.

The decimal value 125 is 7d in hexadecimal:

(7d)_{16} = (7 \times 16^1) + (13 \times 16^0) = (125)_{10}

In C++, we can provide a value in hexadecimal by prefixing it with 0x:

1#include <iostream>
2
3int main(int argc, char** argv){
4  int Value{0x7d};
5  std::cout << "Value: " << Value;
6}

1Value: 125

To view the hexadecimal representation of a value, we can use std::format() (C++20) or std::print() (C++23) using the :x format string, or :X if we want the a-f digits output in uppercase:

1#include <iostream>
2#include <format>
3
4int main(){
5  int Value{125};
6  std::cout << std::format("{:x}", Value);
7}

17d

Why is Hexadecimal Useful?

Hexadecimal is frequently used in programming because it provides a compact way to represent a byte (8 bits) of data. A byte can store one of 256 possible values.

A decimal representation (0 to 255) requires up to three digits, but not all combinations of three digits are valid. For example, 525 is typically too large to be stored in a single byte.
A binary representation (0 to 11111111) requires up to 8 digits and all combinations are valid, but a sequence of 8 binary digits is quite long and can be difficult to read.
A hexadecimal representation (0 to ff) requires only 2 digits, and every combination is valid.

Width

The first thing we need to be mindful of when serializing values from memory, or deserializing objects to memory, is how much memory that type uses. The number of binary digits (bits) that a type uses is often referred to as its width.

Surprisingly, the C++ standard doesn’t specify what the width should be of some important types, including the basic int and unsigned int. On some platforms, these types may be 4 bytes (32 bits) whilst, on others, they may be 8 bytes (64 bits).

If we want to serialize data in a way that is portable despite these inconsistencies, we need a strategy to deal with these differences.

Static and Dynamic Width

For now, we’re focused on types that have a static width - that is, their width does not depend on the value they’re storing.

Types like std::string and std::vector have dynamic width - that is, their size in memory depends on how many characters or other objects they’re storing. This brings their own serialization challenges, which we’ll cover later.

For now, we’ll focus on types that have a static width. That is, their size is known at compile time, and does not depend on the value they’re storing.

We saw that a number like 125 can be represented by a single byte of 8 binary digits:

\begin{aligned} (01111101)_{2} = &\medspace(0 \times 2^7) + (1 \times 2^6)\medspace+\\ &\medspace(1 \times 2^5) + (1 \times 2^4)\medspace+ \\ &\medspace(1 \times 2^3) + (1 \times 2^2)\medspace+ \\ &\medspace(0 \times 2^1) + (1 \times 2^0) \\ = &\medspace(125)_{10} \end{aligned}

However, the width of a statically-sized type like an unsigned int does not change based on the value it is storing. If our value doesn’t require the full width offered by the type, the excess digits can be set to 0.

So, just like the integer 125 can be represented in decimal as 00125, that same value stored in a type with a width of 32 bits (4 bytes) can be represented like this:

100000000 00000000 00000000 01111101

Signed Integers and Two’s Complement

For simplicity, we’re assuming the integer type we’re using is for unsigned values - that is, values that cannot be negative. The same principles around width apply to any integer type, including signed integers like int, but how signed types use their bits is different.

Computers typically represent signed integers using the two’s complement method. We don’t cover it in this course, but there are plenty of resources documenting this representation.

Determining Width

We can find the width of a type using the sizeof operator:

1#include <iostream>
2
3int main() {
4  std::cout << "Size of int: " << sizeof(int)
5    << " bytes, " << sizeof(int) * 8 << " bits\n";
6  std::cout << "Size of float: " << sizeof(float)
7    << " bytes, " << sizeof(float) * 8 << " bits\n";
8  std::cout << "Size of double: " << sizeof(double)
9    << " bytes, " << sizeof(double) * 8 << " bits\n";
10}

1Size of int: 4 bytes, 32 bits
2Size of float: 4 bytes, 32 bits
3Size of double: 8 bytes, 64 bits

Fixed Width Integers

When we need to be explicit about how wide we need our integer to be, we can use an integral type that has an explicit size. The C++ standard library includes implementations of such types, available within the <cstdint> header.

For example, to create an integer that is guaranteed to have a width of 32 bytes, we can use the int32_t type:

1#include <cstdint>
2
3int32_t Value{125};

Similar types are available for a variety of widths, in both signed and unsigned variations:

1#include <cstdint>
2
3// Signed Integers
4int8_t  A{1};
5int16_t B{2};
6int32_t C{3};
7int64_t D{4};
8
9// Unsigned Integers
10uint8_t  E{5};
11uint16_t F{6};
12uint32_t G{7};
13uint64_t H{8};

Should I Always Use Fixed-Width Integers?

Given the unspecified width of the int type, it may be tempting to just stop using it and switch to int32_t in all scenarios. This is a reasonable decision, but more commonly, developers tend to stick to int unless they have some specific reason to be explicit about the size.

These reasons include portability considerations, the need to support larger values, or the need to reduce memory usage. In those situations, it is highly recommended to use fixed-width integer types over alternative built-in integer types like short or long.

The projects or teams you’re working on will likely have some standard guidance on which integer types to use. Google’s C++ style guide says the following:

Of the built-in C++ integer types, the only one used is int. If a program needs an integer type of a different size, use an exact-width integer type from <cstdint>, such as int16_t.

The sizes of integral types in C++ can vary based on compiler and architecture.

The standard library header <cstdint> defines types like int16_t, uint32_t, int64_t, etc. You should always use those in preference to short, unsigned long long and the like, when you need a guarantee on the size of an integer.

Of the built-in integer types, only int should be used. We use int very often, for integers we know are not going to be too big, e.g., loop counters. Use plain old int for such things. You should assume that an int is at least 32 bits, but don't assume that it has more than 32 bits. If you need a 64-bit integer type, use int64_t or uint64_t.

SDL Aliases

When we’re using SDL and including the <SDL.h> headers, we can use aliases for these fixed-width integer types.

Signed integers are available using the Sint8, Sint16, Sint32, and Sint64 aliases. Unsigned integers are available as Uint8, Uint16, Uint32, and Uint64.

1#include <SDL.h>
2
3// Signed Integers
4Sint8  A{1};
5Sint16 B{2};
6Sint32 C{3};
7Sint64 D{4};
8
9// Unsigned Integers
10Uint8  E{5};
11Uint16 F{6};
12Uint32 G{7};
13Uint64 H{8};

Floating Point Representations

Similar to integers, the C++ specification also doesn’t define exact widths for floating-point types like float and double. However, in practice, platforms are much more consistent in their widths. The float type is almost always 32 bits, whilst double uses 64.

Platforms also tend to be consistent in how they use those bits by following the IEEE-754 standard. Those interested can find more information on the Wikipedia page.

Fixed-Width Floats (C++23)

Despite the relative consistency across implementations, we may still prefer to be explicit with the width of our floating point types.

To help us with this, the C++23 specification included standard types with explicit 16, 32, 64, and 128 bit fixed widths. These are available by including the <stdfloat> header:

1#include <stdfloat>
2
3std::float16_t  A{1.0};
4std::float32_t  B{2.0};
5std::float64_t  C{3.0};
6std::float126_t D{4.0};

Note that this is a relatively new addition to the language and may not be available in your project. As of 2025, these types still aren’t widely implemented by compilers.

Should I use Fixed-Width Floats?

Because the basic float and double types are implemented somewhat consistently across platforms, it’s fairly uncommon that fixed-width floats are used even on projects where they are available.

As usual, teams and projects will typically have agreed standards around which types to use. Google’s C++ style guide recommends the following:

Of the built-in C++ floating-point types, the only ones used are float and double. You may assume that these types represent IEEE-754 binary32 and binary64, respectively.

Do not use long double, as it gives non-portable results.

Storing Non-Numeric Data

While integral types like uint8_t and int32_t are designed to store integer values, they are also frequently used to represent other kinds of data.

Remember that, at their core, computers store everything as sequences of bits. Integral types, especially unsigned ones, provide a way to work with arbitrary binary data.

Example: Colors as `uint32_t`

A common example is representing colors. A color can be defined by the intensity of its red, green, and blue components. Each component's intensity is often represented by a number between 0 and 255, which fits perfectly within a single byte (8 bits).

By combining four bytes, we can represent the red, green, blue, and alpha (transparency) values of a color. The uint32_t type has 4 bytes, so we could use this to represent a color.

Let's create a color with maximum red (255), no green (0), no blue (0), and maximum alpha (255). We can use hexadecimal notation to make this more readable, as each byte is represented by exactly two hexadecimal digits:

1// Red: FF, Green: 00, Blue: 00, Alpha: FF
2uint32_t redColor{0xFF0000FF};

Binary Data of any Width

Similarly, an arbitrary byte of data can be represented by a uint8_t, and we can represent larger blobs of binary data by using an array of such values:

1#include <array>
2#include <cstdint>
3#include <vector>
4
5int main() {
6  // One byte
7  uint8_t A{0b01010101};
8
9  // 16 bytes - c-style array
10  uint8_t B[16];
11  B[0] = 0b01010101;
12
13  // 16 bytes - std::array
14  std::array<uint8_t, 16> C;
15  C[0] = 0b01010101;
16
17  // Resizable array of bytes
18  std::vector<uint8_t> D;
19  D.emplace_back(0b01010101);
20}

The `std::byte` and `std::bitset` Types

It’s a little unusual to represent non-numeric data using a numeric type. Because of this, the standard library includes the std::byte and std::bitset types, to make it more explicit that we’re representing arbitrary binary data:

1#include <utility> // for std::byte
2#include <bitset>
3
4// 1 byte (8 bits)
5std::byte A{0b01010101};
6
7// 4 bytes (32 bits)
8std::bitset<32> B{0xFF0000FF};

These types are intentionally more restrictive than their numeric counterparts like uint8_t. For example, it rarely makes sense to multiply non-numeric binary data, so std::byte contains no such operator:

1#include <cstdint>
2#include <utility> // for std::byte
3
4int main() {
5  uint8_t A{0b01010101};
6  A *= 2; // This is allowed 
7
8  std::byte B{0b01010101};
9  B *= 2; // This isn't 
10}

1error: binary '*=': 'std::byte' does not define this operator or a conversion to a type acceptable to the predefined operator

However, unsigned integer types are still the most common way of representing binary data, even when what we’re storing isn’t really intended to be used as an integer.

Summary

This lesson explored how C++ represents data in memory, covering different numeric bases like binary, decimal, and hexadecimal. We also learned about the importance of data width, fixed-width integer types, and how integral types can be used to represent non-numeric data like colors and arbitrary binary sequences.

Key Takeaways:

Computers use binary (base-2) to store data, while humans often use decimal (base-10). Hexadecimal (base-16) is a convenient way to represent binary data.
The "width" of a data type determines how many bits it uses. Fixed-width types like int32_t and uint8_t guarantee a specific size.
Integral types can represent more than just numbers; they can hold any sequence of bits, making them useful for colors, raw data, and more.
The sizeof operator tells you the size of a type in bytes. Multiply by 8 to get the size in bits.
While std::byte is designed for raw data, unsigned integral types are still commonly used for historical and practical reasons.

Was this lesson useful?

Next Lesson

Byte Order and Endianness

Learn how to handle byte order in using SDL's endianness functions

New: AI-Powered AssistanceAI Assistance

Questions and HelpNeed Help?

Get instant help using our free AI assistant, powered by state-of-the-art language models.

Ryan McCombe

Posted 8 hours ago

Lesson Contents