UNICODE string

Refer to the chapter on "Generic Data Types and Data Types" in the Microsoft Online Documentation

Single-byte character set: ASCII Cons: Can only display modern US English (basic Latin alphabet, Arabic numerals, British punctuation)

Double-byte character set: Based on GB2312 ASCII, reserved before number 127, two characters after number 127 are connected together to represent a Chinese character

Multibyte character set: Unicode contains all the characters on the earth, and the character set and encoding are separated.

  Encoding method: UTF-8 maximum 4 bytes header byte first character is 0, the same as ASCII; 1, multi-byte representation.

           UTF-16 16-bit unsigned integer as unit

Windows provides three APIs for character manipulation:

  • Macro definition, compatible with UNICODE and non-UNICODE
    • VC++: _t is the unicode version, without _t is the single-byte version
    • _tcslen
    • In the code, wherever the keyword char is used, it is replaced with TCHAR; wherever char * is used, it is replaced with LPTSTR; wherever the string constant defined in double quotation marks (such as "VCKBASE Online Journal") is replaced by the TEXT macro Write:
  • Non-UNICODE versions of APIs ending in A
  • UNICODE version of API ending with W

 

The standard C language library functions are all ASCII encoding when dealing with strings, so there is a problem with using standard C functions to deal with multi-byte character encoding.

Windows functions Standard C functions
lstrcat strcat
lstrcmp strcmp
lstrcpy strcpy
lstrlen strlen
lstrcmpi cyrcmpi

1. Calculate the length of the string

strlen is used to calculate the length of the string, which is for ANSI strings. For UNICODE strings, the length is calculated by wcslen. 
And _tcslen is a macro, when _UNICODE is defined, it is interpreted as wcslen, if _UNICODE is not defined, it is interpreted as strlen.

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325022022&siteId=291194637