MFC ANSI record read in English mixed document English characters in the UNICODE characters appear garbled singular characters

  Ashamed, do more than three years of C ++, although halfway decent, Linux have done more than a year before, but MFC also made more than a year, as has been maintaining the company's old project, have not done something new.

  Recently whim, looking for some online learning videos, review under the control of the use of MFC, MFC familiar under the API, after all, is the guy to eat, ado to dry.

  In recent parodies a notepad software, have begun to take basic functions, but encountered a rather lame question, MFC in the UNICODE character set when reading ANSI encoded text files, all Chinese, no problem, no problem in English, However, when mixed in English, the English when the number is an odd number, it will be programmed later English characters distortion.

  The reason is that I think too simple, that read ANSI-encoded text file, is to read multi-byte strings, and will have the following code:

  Code parameter only defines a CFile object and opened read-only, personal habits, if we must explain clearly explained, so give these two the code is also copied.

  CFile file;
  file.Open(szFile, CFile::modeRead);

 1 void CNotepadDlg::ReadAnsi(CFile& file)
 2 {
 3     file.Seek(0, CFile::begin);
 4     char buff[1024];
 5     UINT nRet = 0;
 6     CString str;
 7 
 8     while (nRet = file.Read(buff, sizeof(buff - 1)))
 9     {
10         buff[nRet] = '\0';
11         str += buff;
12     }
13 
14     SetDlgItemText(IDC_EDIT_TEXT, str);
15 
16 }

  This is in accordance with a single byte read, of course, a problem also displayed.

  This is Microsoft Notepad to open the ANSI encoded text document.

 

 

   This is my parodies Notepad to open the text document

 

 

  The results are obvious, right.

  Internet also find a lot of approaches, A2T, bstr_t, _tsetlocale (LC_ALL, _T ( "chs")); these methods are tried, I could level bad, really did not understand, and finally the most primitive MultiByteToWideChar ( ) solved the problem. Of course, the code still many problems to be optimized.

  

 1 void CMFC194Dlg::ReadAnsi(CFile& file)
 2 {
 3     file.Seek(0, CFile::begin);
 4     // TODO: 在此处添加实现代码.
 5     char buff[1024];
 6     UINT nRet = 0;
 7     CString str;
 8     
 9     LONGLONG nLen = file.GetLength();
10     char* p = new char[nLen + 1];
11     nLen = file.Read(p, nLen);
12     p[nLen] = '\0';
13     TCHAR* pText = new TCHAR[nLen + 2];
14     memset(pText,0 , nLen + 2);
15 
16     nLen = MultiByteToWideChar(CP_ACP, NULL, p, -1, pText, nLen + 2);
17 
18     SetDlgItemText(IDC_EDIT_TEXT, pText);
19 
20     delete[]p;
21     delete[]pText;
22 }

  读取结果:

 

  处理这个问题的灵魂是这两句代码:

  TCHAR* pText = new TCHAR[nLen + 2];

  memset(pText,0 , nLen + 2);

  因为 ANSI 编码中英文字母占一个字节,中文汉字占两个字节,所以定义 pText 长度不能是 多字节长度 / 2 + 2,这会导致空间不足,使 MultiByteToWideChar() 返回 0,用 GetLastError() 可知返回122。

Guess you like

Origin www.cnblogs.com/SmallAndGreat/p/12194651.html