Converting between character encodings in C++ can be a bit tricky, as the standard library doesn't provide built-in functions for this purpose. However, we can use third-party libraries or platform-specific APIs to accomplish this task. Here's a guide on how to approach this:
One popular library for handling character encoding conversions is ICU (International Components for Unicode). Here's an example of how you might use ICU to convert from UTF-8 to UTF-16:
#include <unicode/ucnv.h>
#include <unicode/unistr.h>
#include <iostream>
#include <string>
int main() {
std::string utf8String = "Hello, 世界!";
// Convert UTF-8 to UTF-16
icu::UnicodeString utf16String =
icu::UnicodeString::fromUTF8(utf8String);
// Convert back to UTF-8 for display
std::string convertedString;
utf16String.toUTF8String(convertedString);
std::cout << "Original: "
<< utf8String << '\n';
std::cout << "Converted: "
<< convertedString << '\n';
}
Original: Hello, 世界!
Converted: Hello, 世界!
On Windows, you can use the Windows API functions for encoding conversion. Here's an example that converts from UTF-8 to UTF-16:
#include <windows.h>
#include <iostream>
#include <string>
int main() {
std::string utf8String = "Hello, 世界!";
// Get the required buffer size
int size = MultiByteToWideChar(
CP_UTF8, 0, utf8String.c_str(),
-1, nullptr, 0
);
// Allocate the buffer
std::wstring utf16String(size, 0);
// Perform the conversion
MultiByteToWideChar(
CP_UTF8, 0, utf8String.c_str(),
-1, &utf16String[0], size);
// Convert back to UTF-8 for display
size = WideCharToMultiByte(
CP_UTF8, 0, utf16String.c_str(), -1,
nullptr, 0, nullptr, nullptr
);
std::string convertedString(size, 0);
WideCharToMultiByte(
CP_UTF8, 0, utf16String.c_str(), -1,
&convertedString[0], size, nullptr, nullptr
);
// Set console output to UTF-8
SetConsoleOutputCP(CP_UTF8);
std::cout << "Original: "
<< utf8String << '\n';
std::cout << "Converted: "
<< convertedString << '\n';
}
Original: Hello, 世界!
Converted: Hello, 世界!
Remember, when working with different encodings, it's crucial to be aware of the encoding of your source files and how your compiler interprets string literals. Always use libraries or APIs that are well-tested and widely used to avoid potential encoding errors.
Answers to questions are automatically generated and may not have been reviewed.
An introduction to C++ character types, the Unicode standard, character encoding, and C-style strings