1. Use the new c++11 feature std::wstring_convert with std::codecvt template class
Link: https://www.zhihu.com/question/39186934/answer/80443490
Source: Zhihu The
copyright belongs to the author. For commercial reprints, please contact the author for authorization, and for non-commercial reprints, please indicate the source.
std::wstring_convert: a transcoder, which receives a template parameter similar to codecvt describing encoding conversion characteristics, which is used to convert the localized wide character wstring and the byteized string of the specified encoding. .
std::codecvt: encoding conversion feature class, used in the template parameter of wstring_convert to specify which encoding to use.
Therefore, the way to realize the mutual conversion of encoding A and B is: with the help of localized wide strings, first convert the A-encoded string into a localized wstring, and then convert the localized wstring into a B-encoded string.
codecvt generally uses the following two specialized subclasses:
std::codecvt_utf8<wchar_t>: for interconversion between UTF8 and localized wchar_t
std::codecvt_byname<wchat_t, char, std::mbstate_t>: for other encodings (eg GBK) and localized wchar_t, the constructor of the class needs to pass in the encoded locale name, because the encoded locale name is determined by the operating system (for example, the locale name of GBK under linux may be "zh_CN.GBK", and It is ".936" under windows), so if it is cross-platform, it still needs to be adapted to different systems.
first convert GBK string to wstring
const char* GBK_LOCALE_NAME = ".936"; //GBK在windows下的locale name string gbk_str {"\xCC\xCC"}; //0xCCCC,"烫"的GBK码 //构造GBK与wstring间的转码器(wstring_convert在析构时会负责销毁codecvt_byname,所以不用自己delete) wstring_convert<codecvt_byname<wchar_t, char, mbstate_t>> cv1(new codecvt_byname<wchar_t, char, mbstate_t>(GBK_LOCALE_NAME)); wstring tmp_wstr = cv1.from_bytes(gbk_str);
wstring_convert<codecvt_utf8<wchar_t>> cv2; string utf8_str = cv.to_bytes(tmp_wstr);