Characters, Unicode and Encoding

Converting Between Character Encodings

How can I convert between different character encodings in C++?

Abstract art representing computer programming

Converting between character encodings in C++ can be a bit tricky, as the standard library doesn't provide built-in functions for this purpose. However, we can use third-party libraries or platform-specific APIs to accomplish this task. Here's a guide on how to approach this:

Using a Third-Party Library

One popular library for handling character encoding conversions is ICU (International Components for Unicode). Here's an example of how you might use ICU to convert from UTF-8 to UTF-16:

#include <unicode/ucnv.h>
#include <unicode/unistr.h>

#include <iostream>
#include <string>

int main() {
  std::string utf8String = "Hello, 世界!";

  // Convert UTF-8 to UTF-16
  icu::UnicodeString utf16String =
    icu::UnicodeString::fromUTF8(utf8String);

  // Convert back to UTF-8 for display
  std::string convertedString;
  utf16String.toUTF8String(convertedString);

  std::cout << "Original: "
    << utf8String << '\n';
  std::cout << "Converted: "
    << convertedString << '\n';
}
Original: Hello, 世界!
Converted: Hello, 世界!

Using Platform-Specific APIs

On Windows, you can use the Windows API functions for encoding conversion. Here's an example that converts from UTF-8 to UTF-16:

#include <windows.h>
#include <iostream>
#include <string>

int main() {
  std::string utf8String = "Hello, 世界!";

  // Get the required buffer size
  int size = MultiByteToWideChar(
    CP_UTF8, 0, utf8String.c_str(),
    -1, nullptr, 0
  );

  // Allocate the buffer
  std::wstring utf16String(size, 0);

  // Perform the conversion
  MultiByteToWideChar(
    CP_UTF8, 0, utf8String.c_str(),
    -1, &utf16String[0], size);

  // Convert back to UTF-8 for display
  size = WideCharToMultiByte(
    CP_UTF8, 0, utf16String.c_str(), -1,
    nullptr, 0, nullptr, nullptr
  );

  std::string convertedString(size, 0);
  WideCharToMultiByte(
    CP_UTF8, 0, utf16String.c_str(), -1,
    &convertedString[0], size, nullptr, nullptr
  );

  // Set console output to UTF-8
  SetConsoleOutputCP(CP_UTF8);

  std::cout << "Original: "
    << utf8String << '\n';
  std::cout << "Converted: "
    << convertedString << '\n';
}
Original: Hello, 世界!
Converted: Hello, 世界!

Remember, when working with different encodings, it's crucial to be aware of the encoding of your source files and how your compiler interprets string literals. Always use libraries or APIs that are well-tested and widely used to avoid potential encoding errors.

This Question is from the Lesson:

Characters, Unicode and Encoding

An introduction to C++ character types, the Unicode standard, character encoding, and C-style strings

Answers to questions are automatically generated and may not have been reviewed.

This Question is from the Lesson:

Characters, Unicode and Encoding

An introduction to C++ character types, the Unicode standard, character encoding, and C-style strings

A computer programmer
Part of the course:

Professional C++

Comprehensive course covering advanced concepts, and how to use them on large-scale projects.

Free, unlimited access

This course includes:

  • 124 Lessons
  • 550+ Code Samples
  • 96% Positive Reviews
  • Regularly Updated
  • Help and FAQ
Free, Unlimited Access

Professional C++

Comprehensive course covering advanced concepts, and how to use them on large-scale projects.

Screenshot from Warhammer: Total War
Screenshot from Tomb Raider
Screenshot from Jedi: Fallen Order
Contact|Privacy Policy|Terms of Use
Copyright © 2024 - All Rights Reserved