Relationship related to ANSI conversion between UTF-8 byte wide, multi-byte, ANSI, UTF-8, Unicode, GBK,

First, the simple and straightforward to say these differences and relations, said the Internet is too complicated or too simple.

Byte width: generally represents a character corresponding to Unicode is two bytes.

Multi-byte: indicates a character, corresponding to a plurality of bytes is ANSI.

ANSI: his produce from the ASCII character set, starting with 7 bits, represent a total of 128 characters, including English, letters, numbers, and special characters. Because behind all countries to demonstrate their native speech, was expanded, resulting in GB2312, BIG5, JIS and other encoded form of ANSI coding, we can now see some of the ASCII and GBK mixed together and say this is the reason.

Unicode: he created in order to unify the different countries of character encodings, different languages ​​can not be stored in the same period of ANSI text encoding, so the middle of the bridge appeared Unicode, Unicode all languages ​​are unified into a set encodings, and therefore do not garbled situation. So he have another name: Unicode.

GBK: It is an extension produced by the ANSI GB2312 previously mentioned, the equivalent of an enhanced version of a collection of ever more Kanji characters.

Second, the conversion between ANSI and UTF-8.

(1) directly above the copy and paste operation VS

#include<stdio.h>
#include<windows.h>
#define CODE_LEN 256
void AnsiToUtf8(char* lpcszStr, char* lpwszStr)
{
  
	WCHAR* strA;
	int   i = MultiByteToWideChar(CP_ACP, 0, lpcszStr, -1, NULL, 0);
	strA = new  WCHAR[i];
	MultiByteToWideChar(CP_ACP, 0, lpcszStr, -1, strA, i);
	i = WideCharToMultiByte(CP_UTF8, 0, strA, -1, NULL, 0, NULL, NULL);
	char* strB = new  char[i];
	WideCharToMultiByte(CP_UTF8, 0, strA, -1, lpwszStr, i, NULL, NULL);
}
void Utf8ToAnsi(char* lpcszStr, char* lpwszStr)
{
	DWORD dwMinSize;
	WCHAR* strTmp;
	dwMinSize = MultiByteToWideChar(CP_UTF8, 0, lpcszStr, -1, NULL, 0);
	strTmp = new WCHAR[dwMinSize];
	MultiByteToWideChar(CP_UTF8, 0, lpcszStr, -1, strTmp, dwMinSize);
	int targetLen = WideCharToMultiByte(CP_ACP, 0, (LPWSTR)strTmp, -1, (char*)lpwszStr, 0, NULL, NULL);
	WideCharToMultiByte(CP_ACP, 0, (LPWSTR)strTmp, -1, (char*)lpwszStr, targetLen, NULL, NULL);
}
int main()
{
	char str1_src[CODE_LEN] = "测试";
	char str2_des[CODE_LEN] = { 0 };
	char str3_src[CODE_LEN] = "娴嬭瘯";
	char str4_des[CODE_LEN] = { 0 };
	AnsiToUtf8(str1_src, str2_des);
	printf("测试的ANSI转换成utf-8为:%s----->%s\n\n",str1_src, str2_des);
	Utf8ToAnsi(str3_src, str4_des);
	printf("测试的utf-8转换成ANSI为:%s----->%s",str3_src, str4_des);
	return 0;
}

(2) Run results

(3) Note: If the operation is not used to create the C ++ style VC6.0 above which it should be noted that the use of the new operator, at this time may only need to use malloc.

Published 23 original articles · won praise 4 · Views 9966

Guess you like

Origin blog.csdn.net/hxp1994/article/details/100905140
Recommended