qt Chinese garbage problem

QTextCodec *codec = QTextCodec::codecForName("UTF-8");
QTextCodec::setCodecForTr(codec);
QTextCodec::setCodecForLocale(QTextCodec::codecForLocale());
QTextCodec::setCodecForCStrings(QTextCodec::codecForLocale());

Was added the above code in the main function.  

 

 

Transfer from: http: //f.dataguru.cn/thread-866992-1-1.html

 

Qt Chinese garbled characters and solve coding problems (UTF-8 / GBK)


Qt science knowledge common encoding two encodings are: UTF-8 and GBK 
★. 8 UTF-: the Unicode TransformationFormat-8bit, allowed containing BOM, but is usually free of BOM. It is to solve a multi-byte character encoding internationally, using its English 8 (ie a byte), Chinese use 24 (three bytes) to encode. UTF-8 character contains all the world's countries need to use the international code, versatility. UTF8 encoding display text on the UTF8 character set can support countries in the browser. For example, if a UTF8 encoding, the English on foreigners IE can display Chinese, they do not need to download IE's Chinese language support package. 
★ GBK is the expansion of the national standard GB2312 basis compatible with the standard of GB2312. GBK character code is represented by a two-byte, i.e. whether in the English double-byte characters are used to represent, in order to distinguish Chinese, to its higher bits are set to 1. GBK contains all Chinese characters, it is the country code, versatility worse than UTF8, UTF8 occupied but the database is larger than the GBD. GBK is an extension GB2312 and GB2312 addition to compatible, but it can also display traditional Chinese, and Japanese Kana. 
★ GBK, GB2312, etc. between UTF8 and must be mutually converted by encoding to Unicode: 
GBK, GB2312 - Unicode - UTF8 
UTF8 - Unicode - GBK, GB2312  
★ in Simplified Chinese windows system, ANSI coded representation of GBK / GB2312 encoding, ANSI usually uses 2 bytes to the range 0x80 ~ 0xFF indicates a Chinese character. Characters between 0x00 ~ 0x7F, 1 byte is still representative of a character. Unicode (UTF-16) encoding all the characters are represented by 2 bytes.
A simple Qt applications this little program, it is estimated we will feel more intimate. Seems to have a considerable number of Chinese users try to wrote this code:

#include <QtGui / QApplication>
#include <QtGui / the QLabel> int main (int argc, char ** the argv) {     the QApplication App (argc, the argv);     QString A = "I characters";     the QLabel label (A);     label.show ();     return app.exec (); } code, save, compile, run, everything is going well, but the result of it: [td]













Most users see
Other users see
ÎòêÇoo×Ö
æ'æ~ˉæ ± ‰ to -


Surprisingly, Chinese did not show up on the screen, there was no understanding of the character. So start with a search engine, began posting on the forum or complain 
finally told that one of the following statements can solve the problem:


QTextCodec :: setCodecForCStrings (QTextCodec codecForName :: ( "GB2312"));
QTextCodec :: setCodecForCStrings (QTextCodec :: codecForName ( "UTF-8") );


two instructions one by one to try, indeed solve (most users is the first, the second is the other users). So why is this so?
Why would QString 0 garbled it clear concept: "I am a Chinese character" is a C language string that is narrow char-string. The above example can be written as


const char * str = "I am a Chinese character";
QString A = str;


or

char str [] = "I am a Chinese character";
QString A = str;


and so
clear concept 1: source file is encoded , but this has not recorded a plain text file encoding own uses 
this is the root of the problem, it may be a test to save in front of the source code into GBK encoding using the hex editor to see the quotes are ce d2 ca c7 ba ba d7 d6 Thus eight bytes. 
Now copy the file to the positive body (Traditional) Chinese of Windows, use Notepad to open what would it look like?

QString a = "Dianyao Luo Jian";
the QLabel label (A);
label.show ();


Europeans and Americans then put the Windows system, then Notepad to open it?

A = QString "ÎòêÇoo × Ö";
the QLabel label (A);
label.show ();


the same file, without making any changes, but eight bytes ce d2 ca c7 ba ba d7 d6 , use of the GBK people from the mainland, Hong Kong, Macao and Taiwan compatriots are BIG5 use, as well as Latin-1 with the Europeans seem to see but it is a completely different character.
2 clear concept: as we all know the 'A' and '\ x41' is equivalent to the same. 
The encoding GBK


const char * STR = "I character"


equivalent to

const char * str = "\ xce \ xd2 \ xca \ xc7 \ xba \ xba \ xd7 \ xd6";



when encoded using UTF-8, etc. price to


const char * str = "\ xe6 \ x88 \ x91 \ xe6 \ x98 \ xaf \ xe6 \ xb1 \ x89 \ xe5 \ xad \ x97";


Note: this statement incomplete right, such as saving as with BOM of UTF-8 when a compiler cl, characters in UTF-8 itself, but is stored within the program Shique corresponding GBK encoding.
A clear concept 3: QString internal uses Unicode. 
Internal uses QString is Unicode, it can simultaneously stored in GBK character "I kanji", BIG5 characters "Dianyao Luo Jian" Latin-1, and the characters "ÎòêÇoo × Ö". 
One problem is that the source code of 8 bytes "\ xce \ xd2 \ xca \ xc7 \ xba \ xba \ xd7 \ xd6", which is converted to Unicode how to co-exist within QString? According to GBK, BIG5, Latin-1, or otherwise ... 
In case you do not tell it, it is selected by default Latin-1, then 8 characters "ÎòêÇoo × Ö" of unicode code is stored into the QString in. Eventually, eight Latin characters appear where you expect to see 4 Chinese characters, the so-called garbled
QString work

const char * str = "I am a Chinese character";
QString str = A;


in fact, a very simple question, when you need from a narrow char * string into Unicode string of QString, QString you need to tell you this char * string in what is encoded? GBK, BIG5, Latin 1- 
ideal situation is: when will char * pass QString, QString and tell what your code is: 
Like the following function, QString member function to know what encoding process in accordance with C string


QString :: fromAscii QString (STR const char *, int size = -1)
QString QString :: fromLatin1 (STR const char *, int size = -1) 
QString QString :: fromLocal8Bit (STR const char *, int size = -1)
QString QString :: fromUtf8 (const char * str, int size = -1)


QString provides only a single member of these functions, far failed to meet everyone's needs, for example, under the Simplified Chinese Windows, local8Bit is GBK, but there is a char string BIG5 or Latin-2 how to do? 
Then use the powerful QTextCodec it, first of all QTextCodec know for sure are responsible for their own coding, and then you give it to a char string, it can be converted to the correct Unicode up.

QString QTextCodec :: toUnicode (const char * chars) const


but this call too much trouble, I wanted to direct


QString a = str;


or

QString a (str);


so with how to do? 
Thus while certainly no way to tell what QString str your coding, only the other way. This is mentioned in the beginning

by QTextCodec :: setCodecForCStrings (codecForName by QTextCodec :: ( "GBK"));
by QTextCodec :: setCodecForCStrings (codecForName by QTextCodec :: ( "UTF-. 8"));

provided QString encoding used by default. Whether the use of which, is the source code is generally GBK, to use GBK, the source code is UTF-8 to use UTF-8. But there is one exception, if you save UTF-8 with BOM's and became Microsoft's use of the cl compiler, this time is still GBK.
Under summary, the main reason is garbled: Internal QString uses Unicode, it can be stored GBK characters at the same time, "I am Chinese characters", BIG5 character "Luo Dian Yao Jian," as well as Latin-1 character " 
When you need from a narrow char * string into Unicode string of QString, QString you need to tell you this char * string in what is encoded? GBK, BIG5, Latin-1? 
In case you do not tell it, it is selected by default Latin-1, then 8 characters "ÎòêÇoo × Ö" the code is stored into the unicode in QString. Eventually, eight Latin characters appear where you expect to see 4 Chinese characters, 
called a garbled. 
There are many online methods described in main.cpp in settings directly:

QTextCodec * = QTextCodec :: codecForName CODEC ( "UTF-8");
QTextCodec :: setCodecForTr (CODEC);
QTextCodec :: setCodecForLocale (CODEC);
QTextCodec :: setCodecForCStrings ( codec);


in fact, in some cases also problematic, because the program might read the Chinese path system, or call an external program under the Chinese path, this time if the system is gb2312 have a problem. 
Since the path is coded using Chinese utf-8 in the deposit to QString, Chinese path decoding system reads the time the system is used GB2312, it will be adjusted with an external program afford Chinese path. 
The following method can solve the above problem:

by QTextCodec by QTextCodec :: = * codecForName CODEC ( "UTF-. 8");
by QTextCodec :: setCodecForTr (CODEC);
:: setCodecForLocale by QTextCodec (by QTextCodec :: codecForLocale ());
by QTextCodec :: setCodecForCStrings (by QTextCodec :: codecForLocale ());

for the external decoding string code in all local encoders.

 

Guess you like

Origin www.cnblogs.com/warmlight/p/11571510.html