Here i have some simple code:
#include <iostream>#include <cstdint> int main() { const unsigned char utf8_string[] = u8"\xA0"; std::cout << std::hex << "Size: "<< sizeof(utf8_string) << std::endl; for (int i=0; i < sizeof(utf8_string); i++) { std::cout << std::hex << (uint16_t)utf8_string[i] << std::endl; } }
I see different behavior here with MSVC and GCC.MSVC sees "\xA0"
as not encoded unicode sequence, and encodes it to utf-8.So in MSVC the output is:
C2A0
Which is correctly encoded in utf8 unicode symbol U+00A0
.
But in case of GCC nonthing happens. It treats string as simple bytes. There's no change even if i remove u8
before string literal.
Both compilers encode to utf8 with output C2A0
if the string is set to: u8"\u00A0";
Why do compilers behave differently and which actually does it right?
Software used for test:
GCC 8.3.0
MSVC 19.00.23506
C++ 11