Byte and multi-byte width

  • Multi-byte character sets (MBCS, the Multi-Byte Chactacter the Set) : refers to more than one byte to represent a character set of character encoding. General English alphabet is represented by 2Byte with 1Byte, Chinese and so on. Compatible with ASCII 127.

In the beginning, the Internet is only one character set --ANSI the ASCII character set , which uses 7 bits to represent a character, represents a total of 128 characters, including letters, numbers, punctuation marks, and other commonly used characters.

In order to expand the ASCII encoding for display native language, different countries and regions to develop different standards, thereby creating GB2312, BIG5, JIS and other respective coding standards. These uses 2 bytes to represent a variety of kanji character encoding extending called ANSI code , also called "MBCS (Muilti-Bytes Charecter Set , a multi-byte character set)."

Incompatible between different ANSI encoding, when the exchange of information internationally, you can not belong to the text of the two languages, the text is stored in the same period of ANSI encoded. A big drawback is that the same coding value represents a different character in different coding systems in. This is likely to cause confusion. It led to the birth unicode code.

  • Wide character set: generally refers to the Unicode character set encoding,

Unicode known as Unicode or Unicode, the unification of different national character encoding.

Unicode is usually represented by a two-byte character , English original encoded into a single byte from a double-byte, high-byte of just need to fill all 0 can be.

For all Unicode characters, Unicode came into being. All Unicode languages ​​are unified into a set encodings, so you do not have a garbage problem.

Unicode unified encoding of course, but it is not efficient, such as UCS-4 (one of the Unicode standard) provides 4 bytes to store a symbol, then the first letters of each are bound to have three bytes is 0, this is very resource-intensive for storage and transport. In order to improve the coding efficiency of Unicode, so it was a UTF-8 encoding. UTF-8 can be automatically selected depending on the length of the encoded symbols. Such as letters can be only 1 byte is enough.

UTF is an acronym for "Unicode Transformation Format", and can be translated into Unicode character set conversion format, how to convert Unicode definition digital program data. With char, char16_t, char32_t represent unsigned 8-bit integers, unsigned 16-bit integer and 32-bit unsigned integer. UTF-8, UTF-16, UTF-32, respectively char, char16_t, char32_t as a coding unit. (Note:.. Char16_t and char32_t is C ++ 11 standard add keywords if your compiler does not support C ++ 11 standard, please use unsigned short and unsigned long) "character" of the UTF-8 encoding requires 3 bytes . "Character" a UTF-16 encoding requires two char16_t, size is 2 bytes. "Character" a UTF-32 encoding requires two char32_t, the size of 4 bytes.

Common character strings before adding L becomes wide characters wchar_t storage (with a character memory 2Byte) of, for example, L 'look', L "abc ah"; or _T ( "sf fly")


The MFC CString to std :: string conversion:

1. When using the Unicode character set, CString equivalent to CStringW; using multi-byte character sets, CString respect CStringA

2. CString --> std::string

// 1. Unicode under CString -> STD :: String 
// method. 1 
CString STR = L "SDF"; 
STD :: = String S CT2A (str.GetString ()); 
    // the GetString () relatively new VS there, the old may be the GetBuffer () 
    STD :: = String S CT2A (str.GetBuffer ()); 
    str.ReleaseBuffer (); 
// method 2 
CString STR = L "dshf"; 
CStringA value of the stra (STR); 
STD :: S String (value of the stra); 
// or 
STD :: String S (CStringA (str)); 

// method. 3 
USES_CONVERSION; 
CString str = L "DJG"; 
STD :: = String S W2A (str); 
// first str - "const wchar_t *, then W2A will wchar_t * const -" const char *, 
// Finally, const char * s initialization

 

 3. std::string --> CStringW / std::wstring

std::string s("dhhh");
CStringW strw(CStringA(s.c_str());
std::wstring sw(strw);

  

Guess you like

Origin www.cnblogs.com/htj10/p/11027323.html