GBK encoding and UTF-8 encoding are two different character encoding methods;
1. The main differences are as follows:
(1) The range of character sets is different: GBK encoding supports Chinese characters and Japanese and Korean characters, while UTF-8 encoding supports characters worldwide;
(2) Different encoding methods: GBK encoding adopts double-byte encoding, and each character occupies 2 bytes, while UTF-8 encoding adopts variable-length encoding, and the encoding length of a character can be 1-4 bytes;
(3) Different compatibility: GBK encoding is widely used in China, but its international application is limited, while UTF-8 encoding has better international compatibility;
(4) The size of the storage space is different: Since each character of the GBK encoding occupies 2 bytes, it occupies a relatively large space in storage, while the UTF-8 encoding uses variable-length encoding, which can allocate storage according to the actual length of the characters space, so it takes up relatively little space in storage;
In short, GBK encoding is applicable to Chinese, Japanese and Korean locales, and UTF-8 encoding is applicable to characters worldwide.
2. Specific transcoding example:
(1) Convert utf8 encoding to gbk encoding
std::string utf8ToGbk(const char *pszSrc)
{
if (nullptr == pszSrc)
return "";
//Windows API 函数,用于将多字节字符集(如 ASCII)转换为宽字符集(如 Unicode)
int nLen = MultiByteToWideChar(CP_UTF8, 0, pszSrc, -1, NULL, 0);
wchar_t* pwszGBK = new wchar_t[nLen + 1];
memset(pwszGBK, 0, nLen * 2 + 2);
MultiByteToWideChar(CP_UTF8, 0, pszSrc, -1, pwszGBK, nLen);
nLen = WideCharToMultiByte(CP_ACP, 0, pwszGBK, -1, NULL, 0, NULL, NULL);
char* pszGBK = new char[nLen + 1];
memset(pszGBK, 0, nLen + 1);
WideCharToMultiByte(CP_ACP, 0, pwszGBK, -1, pszGBK, nLen, NULL, NULL);
string strTemp(pszGBK);
delete[] pwszGBK;
pwszGBK = nullptr;
delete[] pszGBK;
pszGBK = nullptr;
return strTemp;
}
(2) Convert gbk encoding to utf8 encoding
std::string gbk2Utf8(std::string& strData)
{
int nLen = MultiByteToWideChar(CP_ACP, 0, strData.c_str(), -1, NULL, 0);
WCHAR *pWStr1 = new WCHAR[nLen];
MultiByteToWideChar(CP_ACP, 0, strData.c_str(), -1, pWStr1, nLen);
nLen = WideCharToMultiByte(CP_UTF8, 0, pWStr1, -1, NULL, 0, NULL, NULL);
char *pStr2 = new char[nLen];
WideCharToMultiByte(CP_UTF8, 0, pWStr1, -1, pStr2, nLen, NULL, NULL);
string strOutUtf8 = pStr2;
delete[] pWStr1;
pWStr1 = NULL;
delete[] pStr2;
pStr2 = NULL;
return strOutUtf8;
}
3. You can also choose on VS when coding
Note: In C++, CString belongs to unicode, and multi-byte CString cannot be used.