windows core programming and character string processing -2-

1. Character Encoding

Unicode - a standard;

In windows-vista, each unicode character uses UTF-16 encoding;

UTF standard for representing all kinds of characters:

     1. UTF-8: The characters encoded as a byte, two bytes, some, some three, some four bytes;

      2. UTF-16: Each character is encoded into 2 bytes;

     3. UTF-32: Each character is encoded into four bytes;

 

2. ANSI and Unicode characters string data type

  Char data type in C language used to represent an 8-bit ANSI character;

  Microsof c / c ++ compiler built-in data types defined wchar_t: indicates a 16-bit Unicode character;

 windows are compatible with ANSI and Unicode character types and macros defined in WinNT.h in:

typedef WCHAR / CHAR TCHAR, * PTCHAR, PTSTR;

typedef CONST WCHAR / CHAR  * PCTSTR;

_T() ; TEXT();

3.Windows in Unicode and ANSI function

 Starting from windows NT, windows all versions use Unicode to build all the core functions require Unicode strings;

When you call windows function, if passed ANSI character, the function is first converted to Unicode character, and then transmits the result to the operating system; therefore, if you are using ANSI character, there will be overhead;

windows frequently include two versions - represents the Unicode version suffix + W, + A represents the ANSI version;

 

4. C runtime library functions and ANSI function of Unicode

C runtime library functions to handle a range of Unicode characters and ANSI character, and do not call each other internally, namely ANSI version of the function does not turn the character into Unicode characters inside and then call the Unicode version of the function, are "self-reliance";

C runtime library, the function returns the ANSI character length: strlen; function returns the length of the Unicode characters: wcslen;

Then TChar.h, a compatible version of ANSI and Unicode characters: _tcslen;

#ifdef _UNICODE

#define _tcslen  wcslen

#else

#define _tcslen  strlen

#endif

For identifiers are not part of the C ++ standard, C runtime library has always been its additional underscore prefix;

 

5.C runtime security string functions

Strsafe.h header file contains, for each existing function, the new version has a corresponding function, the same front of the name, add a _s (Representative secure) the last suffix; as: _tcscpy corresponding new functions: _tcscpy_s;

Calculate the number of characters: Use _countof macro;

6. Recommended character and string handling

   Use TCHAR / PTSTR, BYTE / PBYTE, TEXT or _T macros perform a global replacement, such replacement PTSTR PSTR;

7.ANSI and Unicode character conversion

  MultiByteToWideChar function to convert the multi-byte character wide character string;

  WideCharToMultiByte function will be converted to a multi-byte wide string string;

 

8. Exporting ANSI and Unicode DLL function

 Dynamic link library, function for a function, such as to achieve reverse a string, there are two versions of ANSI and Unicode, you can achieve wage to reverse a string of Unicode version of the function, for the ANSI version of the function, just first which is converted to Unicode character string, and then call the Unicode version of string reverse function, then the resulting string is converted back to an ANSI string;

9. Analyzing the ANSI or Unicode text

 Derived from AdvApi32.dll, declared in winbase.h in IsTextUnicode function to achieve resolution character type is ANSI or Unicode, but the results are not completely accurate;

                

 

Published 69 original articles · won praise 10 · views 30000 +

Guess you like

Origin blog.csdn.net/u010096608/article/details/103778715