Working with Data

Learn techniques for managing game data, including save systems, configuration, and networked multiplayer.

This lesson is part of the course:

Game Dev with SDL2

Learn C++ and SDL development by creating hands on, practical projects inspired by classic retro games

Get Started for Free

Abstract art representing computer programming

Ryan McCombe

Posted 2 months ago

As we’ve seen, when we create variables and instantiate classes in C++, the data in these objects are stored in a block of memory, which our program can freely read and update. To build more advanced features, we need the ability to convert those memory representations into other forms. This process is typically called serialization.

The output of this serialization can be used in many ways - we might store it on our player’s hard drive to use later, send it to some other program, or send it to some other computer over the internet. When a program later uses that serialized data, it will need to convert back into an understandable form in memory. This process is called deserialization.

In this lesson, we’ll give a brief introduction to why serialization and deserialization are important, and the types of capabilities that they unlock. Through the rest of the chapter, we’ll walk through how to implement these techniques with C++ and SDL, and the things we need to consider when doing so.

We’ll then use these principles to create a basic save and reload system from scratch. Finally, we’ll introduce a free, open-source library that applies these concepts to provide a feature-complete solution, allowing us to quickly add these capabilities to much more complex projects.

Use Cases

Serialisation and deserialisation are the foundation of many common features our programs need to implement. Let’s take a look at some examples

Configuration

One of the most common and simplest scenarios where we might want our program to deserialize data is if that data contains configuration options controlling how the program should behave.

For example, in our previous Minesweeper project, we arranged our project such that a range of configuration options were in a dedicated header file. This included settings like how big the grid was, how many bombs it contained, and the colours, images and fonts we use:

1// Config.h
2// ...
3
4inline constexpr int BOMB_COUNT{6};
5inline constexpr int GRID_COLUMNS{8};
6inline constexpr int GRID_ROWS{8};
7inline constexpr SDL_Color BUTTON_COLOR{
8  200, 200, 200, 255};
9inline const std::string BOMB_IMAGE{
10  "bomb.png"};
11inline const std::string FONT{
12  "Rubik-SemiBold.ttf"};

This made changing these options easier, but is still required a programmer to make those changes, to recompile our project, and to then release a new version.

As an alternative, we could have simply stored these options in a basic text file that we shipped alongside our executable. It might look something like this:

1BOMB_COUNT: 6
2GRID_COLUMNS: 8
3GRID_ROWS: 8
4BUTTON_COLOR: 200, 200, 200, 255
5BOMB_IMAGE: bomb.png
6FONT: Rubik-SemiBold.ttf

When our program starts, it could read this file and base it’s behaviour on them in the same way we previously based our behaviors on the header file. Except now, these settings can be changed by anyone - they just need to edit the file. It doesn’t need programming knowledge, and we don’t need to recompile our project to implement these changes.

We can take this idea further and, rather than shipping the data alongside our software, we could store the data on a network location we control, and have our game download the latest version of that data every time it launches.

Given we control the network location, we can update this data at any time, thereby updating the experience players have with our game.

Our advanced course has a lesson walking through the creation of programs that retrieves dynamic data from the internet, and use that data to determine behaviour:

Using HTTP in Modern C++

A detailed and practical tutorial for working with HTTP in modern C++ using the cpr library.

Save and Load

In most real-world programs, we need the ability to preserve the user’s work from one session to the next. For example, a player is typically not going to complete a long game in a single sitting - they need to make progress over multiple sessions.

As such, we need the capability to save the user’s progress before they quit the game, and reload that progress when they next open it.

For example, as the player progressed through our minesweeper game, we could have maintained a file on their hard drive. That file could keep track of which cells the player cleared, where they placed flags, and where the randomly assigned bombs were placed:

1CELLS_CLEARED: 6, 15, 21, 32
2FLAG_LOCATIONS: 3, 27
3BOMB_LOCATIONS: 3, 5, 7, 9, 16, 27

If the player exits without completing their game, we could read this file the next time they open our program. We could then use the data within it to restore their board to the state it was when they quit, letting them continue from where they left off.

Interoperability

Serialization and deserialization allows one program to understand the output of another program. The most familiar examples of this are with recognisable file formats, such as a JPEG image or an MP3 audio file.

The developers of one program, such as a photo editor, can give their software the ability to serialise a photo in the JPEG format. That serialised data can then be stored on a hard drive or sent over the network to be received by a completely different program, like a web browser.

And, if the developers of that web browser included the capability to deserialise and understand the JPEG format, they can show that photo to their users.

We implemented a practical example of this earlier in the course, when we gave our Minesweeper game the ability to understand the PNG (through SDL_image) and TTF (through SDL_ttf) formats, thereby allowing us to render images and fonts.

Internal Tools and Pipelines

We’re not restricted to just using popular, public serialisation formats. Often, it is useful to serialize data that is a lot more specific to our needs Teams working on larger projects typically create private tools for their own, internal use.

For example, they may have a level editing tool that helps them build their game worlds, and content management tools that let them create and change the quests, enemies and magical items players can encounter within those games.

We need a way to transfer those levels, quests and item data out of our internal tool and into our final game. Having our tools serialize their data into a structured format, and giving our game the capability to deserialize that data into Level, Quest and Item objects lets us do that.

Networking

The final big major use case for serialization involves networking, such as creating a multiplayer game.

In this situation, there might be two instances of our game running on two different computers - player 1’s computer and player 2’s computer. Those machines are connected to each other over a network, such as the internet.

Within the game, each player is assigned a character to play as - C1 is player 1’s character, and C2 is player 2’s character. Players can move their characters around, represented by updating x and y coordinates:

1struct Character {
2  float x;
3  float y;
4}
5
6int main() {
7  Character C1;
8  Character C2;
9}

Each player can see both characters, so both instances of our game contain the C1 and C2 objects. So, from player 1’s perspective, both C1 and C2 exist on their computer. These are just standard C++ objects, stored in player 1’s system memory like any other.

However, to create the desired effect of a multiplayer game, we need to create the illusion that player 1 is only in control of C1. The position of C2 must appear to be controlled by player 2 over the network.

A minimalist way to set this up is to have:

Player 2’s instance of the game serialise it’s C2 object every time its position changes.
Player 2’s instance of the game sends that serialized data across the network to player 1’s game
Player 1’s instance of the game uses the data to update the position of its copy of C2.

As such, the position of the C2 object within each player’s game is kept relatively syncronised, and it seems like player 2 is controlling C2 in both games.

And we do the same thing in the opposite direction for C1 - player 1 sends C1's position to player 2’s game, and player 2’s game uses this data to keep it’s copy of C1 in sync with what player 1 is doing.

State Syncronization, Replication and Authority

This networking model, where we try to keep copies of objects running on different instances of our game in sync, is commonly called state syncronization.

Reporting an object’s state to another instance of the game and having that instance update it’s version of the object to match is called replication.

The instance of our game that gets to decide what the shared state of an object should be is said to have authority over that object. In our example, Player 1’s game has authority over C1, but not over C2.

Within player 1’s game, the state that C2 should have is defined by replicating the state that C2’s authority (player 2’s game) broadcasts. And the opposite is true of player 2’s game - it has authority over C2, but not C1. C1 in player 2’s game is updated by replicating C1 from player 1’s game.

Serialization Formats

We already discussed how some serialization formats are designed to represent specific types of data, such as the PNG format for images and the MP3 format for audio.

There are also popular serialisation formats that are flexible enough to represent a much more diverse range of object types. The two most popular formats for this in the modern era are JSON and YAML.

JSON

JavaScript Object Notation (JSON) is predominantly based around a collection of key and value pairs. This is similar to variables, where a key is like the variable name, and the value is the data associated with that identifier.

JSON values can have a range of familiar types, like strings, booleans, integers and floating point numbers. For example, if we were building a website for movie reviews, we might represent a movie in JSON like this:

1{
2  "title": "Star Wars",
3  "released": true,
4  "year": 1977,
5  "rating": 4.7
6}

JSON values can also be one of two compound types:

Objects containing a nested set of keys-values pairs
Arrays containing a collection of multiple values of any type, including compound types.

These let us create complex structures, nested as deep as we require to accurately represent the object we’re trying to model. A slightly more detailed movie representation might look like this:

1{
2  "title": "Star Wars",
3  "released": true,
4  "year": 1977,
5  "rating": 4.7,
6  "director": {
7    "firstName": "George",
8    "surname": "Lucas"
9  },
10  "cast": [
11    "Mark Hamill",
12    "Harrison Ford",
13    "Carrie Fisher"
14  ]
15}

As a flexible format, we can just as easily use JSON to model almost anything. We could apply it to represent monsters in a game we’re building:

1{
2  "name": "Goblin Thief",
3  "health": 500,
4  "movementSpeed": 3.2,
5  "hostile": true,
6  "loot": [{
7    "name": "Rusty Dagger",
8    "value": 5,
9    "probability": 0.99
10  }, {
11    "name": "Diamond",
12    "value": 500,
13    "probability": 0.01
14  }]
15}

We cover JSON in more detail, and how to use it in our C++ programs, in this dedicated lesson:

Using JSON in Modern C++

A practical guide to working with the JSON data format in C++ using the popular nlohmann::json library.

YAML

Similar to C++, the JSON format is not sensitive to white space. We can add additional spacing and line breaks to lay out our data in whatever way we wish.

This additional spacing can make the data more readable for humans but, when a machine reads the data, it doesn’t care about the visual layout. Instead, it understands the structure based on the location of tokens like {, [ and ,.

The YAML format is just as flexible as JSON, but takes a different approach. In YAML, white space does matter. Rather than using commas, we use line breaks to determine where one data point ends and the next begins:

1title: Star Wars
2released: true
3year: 1977
4rating: 4.7

Rather than using { and [, we use indentation to represent nested data, and - to denote each element of an array:

1title: Star Wars
2released: true
3year: 1977
4rating: 4.7
5director:
6  firstName: George
7  surname: Lucas
8cast:
9- Mark Hamill
10- Harrison Ford
11- Carrie Fisher

1name: Goblin Thief
2health: 500
3movementSpeed: 3.2
4hostile: true
5loot:
6- name: Rusty Dagger
7  value: 5
8  probability: 0.99
9- name: Diamond
10  value: 500
11  probability: 0.01

Binary

Whilst formats like JSON are quite easy to understand and read by humans, it is a slightly inefficient way to transfer data between computer programs. This is primarily for two reasons:

While formats like JSON and YAML are human-readable, they aren't always the most efficient choice for computer-to-computer communication. Binary serialization addresses two key performance challenges:

Conversion Cost: Converting between in-memory data structures and text formats like JSON requires additional processing. For example, converting the string "12345" to and from its binary integer representation (00000000 00000000 00110000 00111001) takes extra CPU cycles.
Data Size: JSON and YAML payloads are typically bigger than their comparable binary representation. This means we have more data to write, transfer, and read, further degrading performance. For example, storing two values like 12345 and 67890 as 32-bit integers requires 8 bytes (64 bits). Representing that same data in the JSON format requires 21 bytes - 21 char values of 1 byte each: {"x":12345,"y":67890}.

Binary solves both these problems, at the expense of readability. Unlike a typical JSON or YAML document, binary data is difficult for humans to read or edit. But, in most cases, particularly in scenarios like networking, we don’t care that the data isn’t easily understood by humans - humans aren’t looking at it anyway.

Instead, we just want to use whatever format allows our application process and react to the data as quickly as possible. That format is binary, and it’s what we’ll spend most of our time focusing on for the rest of this chapter.

Scripting and Behaviors

Our previous examples were based around loading data into our C++ programs, but we can take this idea further. It is common in larger projects for our games to also allow behaviors to be defined outside of the core program.

These behaviors might include things like how a specific type of enemy reacts to being attacked, or what happens when that enemy is defeated. This is typically done by giving our program the ability to read and understand scripts.

Scripts are, in their own right, blocks of programming logic written in a programming language, but typically a different programming language to our core game.

Low Level and High Level Languages

Developers generally categorize programming languages into a spectrum ranging from low level to high level:

Lower level languages involve working relatively close to the underlying hardware. They give us a great degree of control and opportunities to optimize performance, but they require us to be quite cautious and considered in how we build things
Higher level languages sacrifice much of this low level control, and are designed for building simpler things as quickly as possible

Languages like C and C++ are relatively low level in this spectrum, whilst languages like Python and JavaScript are higher level. In high-budget projects, it is possible to combine multiple languages, and get the best of both.

This involves building the core systems that demand low level control and performance in a language like C++. Then, we give these systems the ability to read and understand code written in a higher level language.

This means that, for any given feature, we are free to choose which language would be most appropriate. Functionality that needs low level control and optimization can be created in C++, and functionality where that is less important can be written more quickly and safely in a higher level language.

Creating a C++ program that can understand and execute logic written in an entirely different language is not as difficult as it might sound. Many of the compilers and interpreters for those higher level language are already open source and written in C++, so it can be quite easy to add this capability to our projects.

Scripting Language

When a high-level language is being used to provide supporting logic to a program written in a lower level language, that process is typically called scripting, and the high level language is called the scripting language.

In the context of a game, scripting languages are typically used to define short blocks of logic to be executed in a specific situation, as decided on by the C++ program that we provide these scripts to.

This approach allows high-level behaviors to be created, updated and iterated on more quickly than writing and compiling the equivalent logic in C++. Additionally, it allows that work to be done by people who may not necessarily have the deeper technical understanding that is often required of a lower level language like C++

Lua

One of the most popular scripting languages in the games industry is Lua. Like most languages, it has all the same building blocks we’re familiar with from C++ such as variables, loops, conditional statements and functions.

1-- Lua uses -- for comments
2
3-- Variable
4num = 5
5
6-- Loop
7while num < 20 do
8  num = num + 1
9end
10
11-- Conditional
12if num >= 30 then
13  print('30 plus')
14elseif num > 20 then
15  print('over 20')
16else
17  print('I give up')
18end
19
20-- Function
21function add(x, y)
22  return x + y
23end
24
25result = add(1, 2)

A common way that a low level language and scripting language work together is for the low level language to identify events where it would be appropriate to let the reaction to that event be defined by a script. Someone provides that script using our scripting language of choice, and the program executes it when the event happens.

For example, we might have a C++ class defining Robot enemies. That class could define how a Robot reacts to being attacked, but having that behavior in the C++ class might make it difficult for designers to test and change.

Instead, we could have our C++ class allow Lua to define that behavior by, for example, providing a script that includes a Robot:OnAttacked() function:

1function Robot:OnAttacked(Attacker)
2  if Health > 30 then
3    FightBack(Attacker)
4  elseif Health > 10 then
5    RunAway()
6  else
7    SelfDestruct()
8  end
9end

Our C++ class will invoke this script at the appropriate time. We can also give our Lua scripts access to functions defined natively in C++, such as FightBack() and RunAway() in this example.

This can allow the scripts to include more complex actions that may be difficult to implement in the scripting language, or difficult to implement with acceptable performance.

Visual Scripting

Many of the top-end game engines allow us to implement behaviours through a graphical interface. This is typically called Visual Scripting. For example, Unreal Engine’s visual scripting system is called Blueprint, and it is used in the exact same way as a text-based scripting language.

Blueprint includes all the building blocks of a programming language such as variables, functions, loops, branching conditionals, and the ability to interact with events and functions defined in C++.

But, rather than typing our behaviors in text, we use a drag-and-drop interface to connect nodes together. Our previous Robot:OnAttacked() behaviour looks like this when defined in Blueprint:

Screenshot of Unreal Engine’s blueprint editor

Summary

This lesson explored how games manage and transfer data, from saving player progress to enabling multiplayer functionality. We examined different serialization formats, their tradeoffs, and how they're used in real-world game development scenarios.

The concepts covered form the foundation for many advanced game features, which we’ll build on through practical lessons in the rest of this chapter.

Free and Unlimited Access

Professional C++

Unlock the true power of C++ by mastering complex features, optimizing performance, and learning expert workflows used in professional development

New: AI-Powered AssistanceAI Assistance

Questions and HelpNeed Help?

Get instant help using our free AI assistant, powered by state-of-the-art language models.

Ryan McCombe

Posted 2 months ago

Lesson Contents