Qt Literacy-QTextCodec Theory Summary

I. Overview

QTextCodec is a function provided by Qt to manage string encoding. It can convert back and forth between different encoding methods. It is very useful when reading files and converting format encodings. Qt uses Unicode encoding to store, draw and manipulate strings. In many cases, we may want to process data that uses a different encoding. For example, most Japanese documents are still stored in Shift-JIS or ISO 2022-JP, while documents for Russian users are often stored in KOI8-R or Windows-1251.

Qt provides a set of QTextCodec classes to help convert non-Unicode formats to Unicode formats. It also provides the ability to create your own codec classes.

Insert image description here

2. Encoding support

Supported encodings are:

  • Big5
  • Big5-HKSCS
  • CP949
  • EUC-JP
  • EUC-KR
  • GB18030
  • HP-ROMAN8
  • IBM 850
  • IBM 866
  • IBM 874
  • ISO 2022-JP
  • ISO 8859-1 to 10
  • ISO 8859-13 to 16
  • Iscii-Bng, Dev, Gjr, Knd, Mlm, Ori, Pnj, Tlg, and Tml
  • KOI8-R
  • KOI8-U
  • macintosh
  • Shift-YES
  • TIS-620
  • TSCII
  • UTF-8
  • UTF-16
  • UTF-16BE
  • UTF-16LE
  • UTF-32
  • UTF-32BE
  • UTF-32LE
  • Windows-1250 to 1258

3. Use

If Qt is compiled with ICU support enabled, most of the codecs supported by ICU will also be available to the application.

QTextCodecs can convert some locally encoded strings to Unicode using the following method. Let's say we have some strings encoded in Russian KOI8-R encoding and want to convert them to Unicode. An easy way to do this is:

  QByteArray encodedString = "...";
  QTextCodec *codec = QTextCodec::codecForName("KOI8-R");
  QString string = codec->toUnicode(encodedString);

After this, string holds the text converted to Unicode. Converting a string from Unicode to a local encoding is also simple:

  QString string = "...";
  QTextCodec *codec = QTextCodec::codecForName("KOI8-R");
  QByteArray encodedString = codec->fromUnicode(string);

To read or write files in various encodings, use QTextStream and its setCodec() function. It is best to set the format of these encodings explicitly.

  if (data.open(QFile::WriteOnly | QFile::Truncate)) {
    
    
      QTextStream out(&file);
  	  out.setCodec("UTF-8");
      out << "Result: " << qSetFieldWidth(10) << left << 3.14 << 2.7;
      // writes "Result: 3.14      2.7       "
  }

Extreme care must be taken when trying to convert blocks of data, such as when receiving data over the network. In this case, a multibyte character may be split into two chunks. In the best case, this can result in losing a character, in the worst case, causing the entire conversion to fail. The approach to use in these cases is to create a QTextDecoder object for the codec and use this QTextDecoder throughout the decoding process, as follows:< /span>

  QTextCodec *codec = QTextCodec::codecForName("Shift-JIS");
  QTextDecoder *decoder = codec->makeDecoder();

  QString string;
  while (new_data_available()) {
    
    
      QByteArray chunk = get_new_data();
      string += decoder->toUnicode(chunk);
  }
  delete decoder;

QTextDecoder objects maintain state between chunks, so even splitting multibyte characters between chunks works correctly.

4. Create your own codec class

Qt can support new text encodings by creating a QTextCodec subclass.
Pure virtual function describes the encoder to the system, which is used as needed in the different text file formats supported by QTextStream, and for locale-specific character input and output under X11.

To add support for another encoding to Qt, create a subclass of QTextCodec and implement the function j listed in the following table.

function describe
name() Returns the official name of the encoding. If the encoding is listed in the IANA character set encoding file, this name should be the preferred MIME name for that encoding.
aliases() Returns a list of encoded alternative names. QTextCodec provides a default implementation that returns an empty list. For example, "ISO-8859-1" has "latin1", "CP819", "IBM819", and "iso-ir-100" as aliases.
mibEnum () If the encoding is listed in the IANA character set encoding file, the corresponding MIB enumeration is returned.
convertToUnicode () Convert 8-bit string to Unicode.
convertFromUnicode () Convert Unicode string to 8-bit string.

Guess you like

Origin blog.csdn.net/qq_43680827/article/details/133942447