Characters, Unicode and Encoding

Cross-Platform Unicode Support in C++

How do I implement proper Unicode support in a cross-platform C++ application?

Abstract art representing computer programming

Implementing proper Unicode support in a cross-platform C++ application can be challenging due to differences in how various operating systems and compilers handle Unicode. Here's a guide on how to approach this:

Use UTF-8 Everywhere

UTF-8 is widely supported and provides good compatibility across platforms. Use it as your default encoding:

#include <iostream>
#include <string>

#ifdef _WIN32
#include <Windows.h>
#endif

// Function to convert a char8_t string literal to
// std::string
std::string convert_utf8(const char8_t* utf8_text) {
  return std::string(
      reinterpret_cast<const char*>(utf8_text));
}

int main() {
// Set the console output code page to
// UTF-8 on Windows
#ifdef _WIN32
  SetConsoleOutputCP(CP_UTF8);
#endif

  // UTF-8 string literal
  std::string text = convert_utf8(u8"Hello, 世界!");
  std::cout << text;
}

Platform-Specific Considerations

Windows uses UTF-16 internally. You may need to convert between UTF-8 and UTF-16:

#include <iostream>
#include <string>

#ifdef _WIN32
#include <Windows.h>
#endif

std::wstring utf8ToUtf16(const std::string& utf8) {
#ifdef _WIN32
  int size = MultiByteToWideChar(
    CP_UTF8, 0, utf8.c_str(), -1, nullptr, 0
  );
  std::wstring utf16(size, 0);
  MultiByteToWideChar(
    CP_UTF8, 0, utf8.c_str(), -1, &utf16[0], size
  );
  return utf16;
#else
  // On non-Windows platforms, this function
  // would need a different implementation
  return std::wstring(utf8.begin(), utf8.end());
#endif
}

std::string utf16ToUtf8(const std::wstring& utf16) {
#ifdef _WIN32
  int size = WideCharToMultiByte(
    CP_UTF8, 0, utf16.c_str(), -1,
    nullptr, 0, nullptr, nullptr
  );
  std::string utf8(size, 0);
  WideCharToMultiByte(
    CP_UTF8, 0, utf16.c_str(), -1,
    &utf8[0], size, nullptr, nullptr
  );
  return utf8;
#else
  // On non-Windows platforms, this function 
  // would need a different implementation
  return std::string(utf16.begin(), utf16.end());
#endif
}

int main() {
  // Convert u8 string literal to std::string
  const char8_t* utf8_literal = u8"Hello, 世界!";
  std::string utf8(
    reinterpret_cast<const char*>(utf8_literal));

#ifdef _WIN32
  // Set the console output code page to
  // UTF-8 on Windows
  SetConsoleOutputCP(CP_UTF8);
#endif

  std::wstring utf16 = utf8ToUtf16(utf8);

#ifdef _WIN32
  // Use UTF-16 for Windows API calls
  MessageBoxW(
    nullptr, utf16.c_str(),
    L"Unicode Test", MB_OK
  );
#endif

  // Convert back to UTF-8 for cross-platform code
  std::string backToUtf8 = utf16ToUtf8(utf16);
  std::cout << backToUtf8;
}
Hello, 世界!

Most Unix-like systems use UTF-8 by default. Ensure your locale is set correctly:

#include <iostream>
#include <locale>
#include <codecvt>

int main() {
  // Set the global locale to UTF-8
  std::locale::global(std::locale("en_US.UTF-8"));
  // Imbue std::wcout with the global locale
  std::wcout.imbue(std::locale());

  // Output wide string literals
  std::wcout << L"Hello, 世界!" << L'\n';
}
Hello, 世界!

Use a Cross-Platform Library

Consider using a library like ICU (International Components for Unicode) for consistent Unicode handling across platforms:

#include <iostream>
#include <unicode/unistr.h>
#include <unicode/ustream.h>

int main() {
  icu::UnicodeString ustr =
    icu::UnicodeString::fromUTF8("Hello, 世界!");
  std::cout << ustr << '\n';
}

File I/O

When reading or writing files, always specify the encoding:

#include <iostream>
#include <fstream>
#include <string>
#include <windows.h>
#include <fcntl.h>
#include <io.h>

void WriteUTF8File(const std::wstring& fileName,
                   const std::wstring& content) {
  std::ofstream outFile(fileName, std::ios::binary);
  if (!outFile.is_open()) {
    std::wcerr
      << L"Failed to open file for writing.\n";
    return;
  }
  // Convert wide string to UTF-8
  int size_needed =
    WideCharToMultiByte(
      CP_UTF8, 0, content.c_str(), -1,
      nullptr, 0, nullptr, nullptr
    );
  std::string utf8Str(size_needed, 0);
  WideCharToMultiByte(
    CP_UTF8, 0, content.c_str(), -1, &utf8Str[0],
    size_needed, nullptr, nullptr);
  // Write to file
  outFile.write(
    utf8Str.c_str(),
    utf8Str.size() - 1 // Exclude null terminator
  ); 
  outFile.close();
}

std::wstring ReadUTF8File(
  const std::wstring& fileName
) {
  std::ifstream inFile(fileName, std::ios::binary);
  if (!inFile.is_open()) {
    std::wcerr
      << L"Failed to open file for reading.\n";
    return L"";
  }
  // Read file content into a string
  std::string utf8Str(
    (std::istreambuf_iterator<char>(inFile)),
    std::istreambuf_iterator<char>());
  inFile.close();
  // Convert UTF-8 string to wide string
  int size_needed = MultiByteToWideChar(
    CP_UTF8, 0, utf8Str.c_str(), -1, nullptr, 0);
  std::wstring wideStr(size_needed, 0);
  MultiByteToWideChar(
    CP_UTF8, 0, utf8Str.c_str(), -1,
    &wideStr[0], size_needed);
  return wideStr;
}

int main() {
  // Set console output to UTF-8
  SetConsoleOutputCP(CP_UTF8);

  // Enable UTF-8 mode for C++ streams
  _setmode(_fileno(stdout), _O_U8TEXT);
  _setmode(_fileno(stderr), _O_U8TEXT);

  std::wstring fileName = L"test.txt";
  std::wstring content = L"Hello, 世界!";

  // Writing UTF-8
  WriteUTF8File(fileName, content);

  // Reading UTF-8
  std::wstring readContent =
    ReadUTF8File(fileName);

  std::wcout << L"Content read from file: "
    << readContent << std::endl;
}
Content read from file: Hello, 世界!

Best Practices

  1. Use UTF-8: Use UTF-8 as your default encoding for strings and files.
  2. Platform-Specific Code: Isolate platform-specific Unicode handling in separate functions or classes.
  3. Consistent Internal Representation: Choose a consistent internal representation (e.g., UTF-8 or UTF-16) and convert at system boundaries.
  4. Avoid Assumptions: Don't assume one character equals one byte or one display width.
  5. Test Thoroughly: Test your application with a wide range of Unicode characters on all target platforms.
  6. Handle Input/Output Carefully: Be mindful of how different systems handle Unicode input and output.
  7. Use Modern C++ Features: Leverage C++11 and later features for Unicode support, like the u8 string literal prefix.

By following these guidelines and using appropriate libraries, you can create C++ applications with robust Unicode support that work consistently across different platforms.

This Question is from the Lesson:

Characters, Unicode and Encoding

An introduction to C++ character types, the Unicode standard, character encoding, and C-style strings

Answers to questions are automatically generated and may not have been reviewed.

This Question is from the Lesson:

Characters, Unicode and Encoding

An introduction to C++ character types, the Unicode standard, character encoding, and C-style strings

A computer programmer
Part of the course:

Professional C++

Comprehensive course covering advanced concepts, and how to use them on large-scale projects.

Free, unlimited access

This course includes:

  • 125 Lessons
  • 550+ Code Samples
  • 96% Positive Reviews
  • Regularly Updated
  • Help and FAQ
Free, Unlimited Access

Professional C++

Comprehensive course covering advanced concepts, and how to use them on large-scale projects.

Screenshot from Warhammer: Total War
Screenshot from Tomb Raider
Screenshot from Jedi: Fallen Order
Contact|Privacy Policy|Terms of Use
Copyright © 2024 - All Rights Reserved