Methods of converting various encoding formats under c++

1. Use the new c++11 feature std::wstring_convert with std::codecvt template class

Author: Gomo Psivarh
Link: https://www.zhihu.com/question/39186934/answer/80443490
Source: Zhihu The
copyright belongs to the author. For commercial reprints, please contact the author for authorization, and for non-commercial reprints, please indicate the source.

The functions of these two template classes are:
std::wstring_convert: a transcoder, which receives a template parameter similar to codecvt describing encoding conversion characteristics, which is used to convert the localized wide character wstring and the byteized string of the specified encoding. .
std::codecvt: encoding conversion feature class, used in the template parameter of wstring_convert to specify which encoding to use.

 

Therefore, the way to realize the mutual conversion of encoding A and B is: with the help of localized wide strings, first convert the A-encoded string into a localized wstring, and then convert the localized wstring into a B-encoded string.

codecvt generally uses the following two specialized subclasses:
std::codecvt_utf8<wchar_t>: for interconversion between UTF8 and localized wchar_t
std::codecvt_byname<wchat_t, char, std::mbstate_t>: for other encodings (eg GBK) and localized wchar_t, the constructor of the class needs to pass in the encoded locale name, because the encoded locale name is determined by the operating system (for example, the locale name of GBK under linux may be "zh_CN.GBK", and It is ".936" under windows), so if it is cross-platform, it still needs to be adapted to different systems.

Here is an example of converting GBK string to UTF8 string under Windows:
first convert GBK string to wstring
const char* GBK_LOCALE_NAME = ".936"; //GBK在windows下的locale name string gbk_str {"\xCC\xCC"}; //0xCCCC,"烫"的GBK码 //构造GBK与wstring间的转码器(wstring_convert在析构时会负责销毁codecvt_byname,所以不用自己delete) wstring_convert<codecvt_byname<wchar_t, char, mbstate_t>> cv1(new codecvt_byname<wchar_t, char, mbstate_t>(GBK_LOCALE_NAME)); wstring tmp_wstr = cv1.from_bytes(gbk_str); 
Then convert wstring to UTF8 string
wstring_convert<codecvt_utf8<wchar_t>> cv2; string utf8_str = cv.to_bytes(tmp_wstr); 
Transcoding is complete. The content in utf8_str should be "\xE7\x83\xAB" (hot UTF8).
2. Use the libiconv library (supports cross-platform)

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325987038&siteId=291194637