Previously, we’ve focused on serializing single bytes of data at a time - usually the 8-bit char
type. However, when we start to serialize multi-byte objects, such as int
and float
values, things can get slightly more complex.
We need to pay careful attention to the order of those bytes, especially when working across different systems.
In this lesson, we'll explore how computers store multi-byte values, understand the challenges of different byte orderings, and learn practical techniques for handling these differences using SDL's binary manipulation functions.
Let’s first consider how we represent numbers in plain English. Numbers larger than 9 are represented by multiple digits - eg 45
requires two digits (4 and 5), whilst 162
requires three (1, 6, and 2).
There are two things to note here that we may take for granted, but we should be conscious of as we build out these concepts:
45
and 54
are not the sameFor example, in a number like 45
, the first digit (4) is more significant than the second digit (5). It is more significant because, if we increment the first digit, we increase the value of our number by 10
(from 45 to 55), but incrementing the second digit only increases the value of the overall number by 1
(from 45 to 46).
This pattern continues for numbers of any length. In a number like 162
, the 1
is more significant than the 6
, and the 6
is more significant than the 2
.
These exact same considerations apply when dealing numeric types comprising of multiple bytes, such as 16
, 32
and 64
bit integers, which have 2
, 4
and 8
bytes respectively.
But unfortunately, unlike our familiar English numeric system, there’s no agreed standard here. Some systems store the most significant bytes first, while others use different orderings
When working in a low level context, we often also need to deal with multiple different byte-order conventions within the same system, and convert data from one to the other as needed. For example, data transferred over a network is commonly done using the most-significant-byte-first order, but many CPU archittectures expect the exact opposite ordering.
The way in which a system orders its bytes is referred to as it’s endianness. Most fall into one of two categories:
There are other, less popular possibilities with names like bi-endian, middle-endian and mixed-endian. However, big and little endian are the most common, and they’re what we’ll focus on for now.
When we read the binary representation of a multi-byte number, such as a Uint32
, it’s important that we understand and react appropriately to how those bytes are ordered.
For example, if data is serialized in a little-endian format, and then some other system deserializes that data with the assumption that it is big-endian, the values will mismatch.
The following program shows the implications this has. To simulate the endaness mismatch, we’ll use the SDL_Swap32()
function, which reverses the order of the 4 bytes in a 32-bit block of memory:
#include <iostream>
#include "SDL.h"
int main(int argc, char** argv) {
Uint32 Serialized{42};
std::cout << "Serialized: " << Serialized;
Uint32 Deserialized{SDL_Swap32(Serialized)};
std::cout << "\nDeserialized: "
<< Deserialized;
return 0;
}
Serialized: 42
Deserialized: 704643072
If we need to check the endianness configured by our compiler, SDL provides the SDL_BYTEORDER
preprocessor definition.
We can compare this to the SDL_BIG_ENDIAN
or SDL_LIL_ENDIAN
definitions to understand our environment:
#include <iostream>
#include "SDL.h"
int main(int argc, char** argv) {
std::cout << "System Endianness:\n";
#if SDL_BYTEORDER == SDL_BIG_ENDIAN
std::cout << "Big Endian (most significant "
"byte first)";
#elif SDL_BYTEORDER == SDL_LIL_ENDIAN
std::cout << "Little Endian (least "
"significant byte first)";
#else
std::cout << "Unknown byte order";
#endif
Uint16 value{0x1234};
auto* bytes{
reinterpret_cast<std::byte*>(&value)};
std::cout << "\nValue 0x1234 is stored as: "
<< std::hex << static_cast<int>(bytes[0])
<< " " << static_cast<int>(bytes[1]);
return 0;
}
System Endianness:
Little Endian (least significant byte first)
Value 0x1234 is stored as: 34 12
To control the endianness of multi-byte values when serializing, SDL provides a series of helpful functions. For example, to write binary data in the little-endian format, we can use one of 3 functions depending on the memory size of the value:
SDL_WriteLE16()
: Writes 16 bits (2 bytes) of data in the little-endian orderSDL_WriteLE32()
: Writes 32 bits (4 bytes) of data in the little-endian orderSDL_WriteLE64()
: Writes 64 bits (8 bytes) of data in the little-endian orderThese functions will convert the data if needed. If our system is little-endian, our memory will be written as-is. If our system is big-endian, SDL will write the bytes in the opposite order. Either way, it guarantees that our output is in the little-endian order.
Here’s an example in code:
#include <iostream>
#include "SDL.h"
int main(int argc, char** argv) {
SDL_RWops* Handle{
SDL_RWFromFile("data.bin", "wb")};
if (!Handle) {
std::cout << "Error opening file: "
<< SDL_GetError();
}
Uint32 Content{42};
SDL_WriteLE32(Handle, Content);
SDL_RWclose(Handle);
return 0;
}
Big-endian variations of these functions are also available - SDL_WriteBE16()
, SDL_WriteBE32()
, and SDL_WriteBE64()
.
Binary data isn’t inherently designed to be read by humans - even opening a binary file usually requires specialist tools rather than a standard text editor.
However, if we really need to analyse binary data, we still can. The tool we need is is commonly called a hex editor. Our IDE is likely to include a hex editor or have one available as a plugin. We can alternatively use a standalone tool or a website such as hexed.it.
If we open our previous output representing the number 42
in a hex editor, we should see our 4 bytes of binary data represented in hexadecimal as 2a 00 00 00
.
42 is a relatively small number in the range of what can be stored in a 4-byte integer. As such, its value can be represented entirely in the least significant byte and, because we wrote this data in the little-endian order, the least significant byte comes first.
Converting the hex value 2a
to decimal should confirm that our number, 42
, was accurately serialized.
SDL’s endianness-sensitive write functions like SDL_WriteLE32()
return 1
if they succeeded, or 0
otherwise. We can use this to check if the write was successful, and react accordingly.
Below, we attempt to write to a file that we opened only for reading using the rb
flag:
#include <iostream>
#include "SDL.h"
int main(int argc, char** argv) {
SDL_RWops* Handle{
SDL_RWFromFile("data.bin", "rb")};
if (!Handle) {
std::cout << "Error opening file: "
<< SDL_GetError();
}
Uint32 Content{42};
if (SDL_WriteLE32(Handle, Content)) {
std::cout << "Write Successful";
} else {
std::cout << "Write Failed: "
<< SDL_GetError();
}
SDL_RWclose(Handle);
return 0;
}
Write Failed: Error writing to datastream
Once we know the endianness of the data we’re working with, we can choose an appropriate function to read that data into memory. For example, if we know the data follows the little-endian byte order, we can use one of these functions:
SDL_ReadLE16()
: Read the next 16 bits (2 bytes) of data, with the assumption that it is little-endianSDL_ReadLE32()
: Read the next 32 bits (4 bytes) of data, with the assumption that it is little-endianSDL_ReadLE64()
: Read the next 64 bits (8 bytes) of data, with the assumption that it is little-endianThese functions will read the data with the assumption that it is in the little-endian format. Then, if our system is also little-endian, it will write it to memory as-is. If our system is not little-endian, SDL will convert the data to our native format before storing it in memory.
Here’s an example in code:
#include <iostream>
#include "SDL.h"
int main(int argc, char** argv) {
SDL_RWops* Handle{
SDL_RWFromFile(
"data.bin", "rb")};
if (!Handle) {
std::cout << "Error opening file: "
<< SDL_GetError();
}
Uint32 Content{SDL_ReadLE32(Handle)};
std::cout << "Content: " << Content;
SDL_RWclose(Handle);
return 0;
}
Content: 42
Big-endian variations of these functions are also available - SDL_ReadBE16()
, SDL_ReadBE32()
, and SDL_ReadBE64()
.
When serializing and deserializing data exclusively for our own program, these functions make dealing with endianness quite easy. We simply choose one (little-endian or big-endian) and stick to it.
So, for example, if we choose little-endian, we use the SDL_WriteLE32()
function to write all 4-byte values, and SDL_ReadLE32()
to read them.
Whilst SDL will happily convert byte orders for us, those conversions still have a performance cost. Where possible, we should try to minimise the number of conversions necessary.
For example, if we know the system we’re building for is little-endian, it’s probably a good idea to serialize our data using that same order to minimise the amount of conversions needed.
Most modern CPUs are little-endian, so this is a good default unless we have a specific reason to opt for big-endian ordering.
SDL provides some utility functions that allow us to swap byte orders at any time, independently of the SDL_RWops
context. For example, the SDL_Swap32()
function byte-swaps 4 bytes of data.
The following program shows an example of this, and also includes a LogBytes()
function that can be helpful for visualising how a value is represented in bytes:
#include <iomanip>
#include <iostream>
#include "SDL.h"
void LogBytes(Uint32 x) {
auto* bytes = reinterpret_cast<std::byte*>(&x);
for (std::size_t i = 0; i < 4; ++i) {
std::cout
<< std::hex
<< std::setw(2)
<< std::setfill('0')
<< std::to_integer<int>(bytes[i]) << " ";
}
}
int main(int argc, char** argv) {
Uint32 Original{42};
std::cout << "Original: ";
LogBytes(Original);
std::cout << "\n Swapped: ";
LogBytes(SDL_Swap32(Original));
return 0;
}
Original: 2a 00 00 00
Swapped: 00 00 00 2a
We also have SDL_Swap16()
and SDL_Swap64()
for byte-swapping 16 and 64 byte values respectively, and SDL_SwapFloat()
for the float
data type.
SDL also provide a range of functions for converting data from a known endianness to the system’s native endianness.
For example, if we have 4 bytes of big-endian data, and we want to ensure it is in the system’s native endianness, we can use the SDL_Swap32()
function.
If our system is also big-endian, this function will just return the data without modification. However, if our system is little-endian, the function will return it with its byte order reversed:
#include <iomanip>
#include <iostream>
#include "SDL.h"
void LogBytes(Uint32 x) {
auto* bytes = reinterpret_cast<std::byte*>(&x);
for (std::size_t i = 0; i < 4; ++i) {
std::cout
<< std::hex
<< std::setw(2)
<< std::setfill('0')
<< std::to_integer<int>(bytes[i]) << " ";
}
}
int main(int argc, char** argv) {
Uint32 BigEndian{SDL_Swap32(42)};
std::cout << "Big-Endian: ";
LogBytes(BigEndian);
std::cout << "\n Native: ";
LogBytes(SDL_Swap32BE(BigEndian));
return 0;
}
Big-Endian: 00 00 00 2a
Native: 2a 00 00 00
We can also handle 2 bytes of big-endian data using SDL_Swap16BE()
, 8 bytes using SDL_Swap64BE()
, and a big-endian float
using SDL_SwapFloatBE()
.
And, if we know our data is little-endian, we can convert it to our native order using SDL_Swap16LE()
, SDL_Swap32LE()
, SDL_Swap64LE()
, and SDL_SwapFloatLE()
.
Binary data handling across different platforms requires understanding and managing byte order differences
We've explored the concept of endianness, learned about SDL's binary manipulation functions, and practiced implementing stable data serialization techniques. Key takeaways:
Learn how to handle byte order in using SDL's endianness functions
Learn C++ and SDL development by creating hands on, practical projects inspired by classic retro games