The QFile class in Qt reads ansi-encoded txt files and displays garbled characters in the QTextEdit control

Series Article Directory

foreword

Use the QFile class in Qt to read text files in ANSI encoding format, and display garbled characters in the QTextEdit control, which may be caused by encoding problems. The QFile class uses the system's local encoding to read files by default, and the ANSI encoding is usually different from the system's local encoding.

In order to correctly read ANSI-encoded text files and display them in the QTextEdit control, you can use the QTextCodec class to specify the correct encoding. The following is a sample code snippet that demonstrates how to read an ANSI-encoded text file and display the correct text in a QTextEdit control:

#include <QApplication>
#include <QFile>
#include <QTextStream>
#include <QTextCodec>
#include <QTextEdit>

int main(int argc, char *argv[])
{
    
    
    QApplication app(argc, argv);

    // 创建QTextEdit控件
    QTextEdit textEdit;

    // 读取ANSI编码的文本文件
    QFile file("path_to_your_file.txt");
    if (file.open(QIODevice::ReadOnly | QIODevice::Text))
    {
    
    
        // 使用ANSI编码创建QTextCodec对象
        QTextCodec *codec = QTextCodec::codecForName("Windows-1252");

        // 使用指定的编码创建QTextStream对象
        QTextStream stream(&file);
        stream.setCodec(codec);

        // 读取文本文件内容
        QString content = stream.readAll();

        // 在QTextEdit控件中显示文本
        textEdit.setPlainText(content);

        // 关闭文件
        file.close();
    }

    // 显示窗口
    textEdit.show();

    return app.exec();
}

insert image description here

1. Still unable to solve the garbled problem

A QTextCodec object is created using the Windows-1252 encoding (also known as ANSI encoding) and applied to the QTextStream in order to correctly decode the file content. Then, we set the decoded text content as the text of the QTextEdit control. Using utf8 still can't solve the garbled problem,

void ProjectWin::readParaFile(QString filePath)
{
    
    
    m_paraText->clear();
    if (!m_paraText) {
    
    
        qDebug() << "m_paraText is null!";
        return;
    }



    QString txtFile = filePath.left(filePath.size() -3);
    txtFile += "txt";
    QFile file(filePath);
    if(!file.open(QIODevice::ReadOnly)) {
    
    
        qDebug() << file.errorString();
    }

//    QByteArray fileData = file.readAll(); // 一次性读取整个文件内容
//    QString decodedText = QTextCodec::codecForName("UTF-8")->toUnicode(fileData); // 使用UTF-8解码
//    m_paraText->setPlainText(decodedText); // 设置文本内容

    QTextStream in(&file);
    in.setCodec("UTF-8");  // 设置编码为UTF-8
//    in.setCodec("GBK");  // 设置编码为GB18030
    QString chineseText;

//    QTextCodec* codec = QTextCodec::codecForName("UTF-8"); // 指定正确的文本编码
    while(!in.atEnd()) {
    
    
        QString line = in.readLine();
//        QByteArray utf8Data = line.toUtf8();
//        qDebug() << utf8Data.data();
//        line = line.trimmed(); //去掉2端字符串空格

//        emit appendText(line);  // 使用信号来添加文本

        if(line.contains(u8"任务代号:", Qt::CaseSensitive))
        {
    
    
            int pos = line.lastIndexOf(":");
            QString taskNum = line.right(line.size() - pos - 2);
            taskNum = taskNum.trimmed();
            m_taskNumSet.insert(taskNum);
//            break;
        }

//        m_paraText->setFont(QFont("Microsoft YaHei"));
//        m_paraText->setFont(QFont("Microsoft YaHei")); // 使用"Microsoft YaHei"字体

        m_paraText->append(line); // 添加到QTextEdit控件中
    }

    file.close();
}

Two, the solution

1. Method 1: Use the fromLocal8Bit() function of QString

void ProjectWin::readParaFile(QString filePath)
{
    
    
    m_paraText->clear();
    if (!m_paraText) {
    
    
        qDebug() << "m_paraText is null!";
        return;
    }

    QString txtFile = filePath.left(filePath.size() -3);
    txtFile += "txt";
    filePath = "E:/work/ImageManageSys/utf8/0000_051623_162252_05_004_00001_00008_00.txt";
    QFile file(filePath);
    if(file.open(QIODevice::ReadOnly)) {
    
    
//        qDebug() << file.errorString();
        QTextCodec::setCodecForLocale(QTextCodec::codecForName("gb2312"));//中文转码声明
        QString temStr;
        while(!file.atEnd())
        {
    
    
                QByteArray arr = file.readAll();
                arr.replace(0x0B,0x0D);
                temStr = QString::fromLocal8Bit(arr, arr.length());//Window下的QByteArray转QString
                m_paraText->append(temStr);
        }

        //读取任务号
        while (!file.atEnd())
        {
    
    
           QString line = file.readLine();
           if(line.contains(u8"任务代号:", Qt::CaseSensitive))
            {
    
    
                int pos = line.lastIndexOf(":");
                QString taskNum = line.right(line.size() - pos - 2);
                taskNum = taskNum.trimmed();
                m_taskNumSet.insert(taskNum);
                break;
            }

        }
    }

    file.close();
}

In this way, the txt file in ansi encoding format can be displayed normally, but if you read the txt file in utf-8 format, it will be garbled instead, remember! Remember! Remember! Say important things three times.

2. Read the file in utf-8 encoding format

void ProjectWin::readParaFile(QString filePath)
{
    
    
    m_paraText->clear();
    if (!m_paraText) {
    
    
        qDebug() << "m_paraText is null!";
        return;
    }



    QString txtFile = filePath.left(filePath.size() -3);
    txtFile += "txt";
    QFile file(filePath);
    if(!file.open(QIODevice::ReadOnly)) {
    
    
        qDebug() << file.errorString();
    }

//    QByteArray fileData = file.readAll(); // 一次性读取整个文件内容
//    QString decodedText = QTextCodec::codecForName("UTF-8")->toUnicode(fileData); // 使用UTF-8解码
//    m_paraText->setPlainText(decodedText); // 设置文本内容

    QTextStream in(&file);
    in.setCodec("UTF-8");  // 设置编码为UTF-8
//    in.setCodec("GBK");  // 设置编码为GB18030
    QString chineseText;

//    QTextCodec* codec = QTextCodec::codecForName("UTF-8"); // 指定正确的文本编码
    while(!in.atEnd()) {
    
    
        QString line = in.readLine();
//        QByteArray utf8Data = line.toUtf8();
//        qDebug() << utf8Data.data();
//        line = line.trimmed(); //去掉2端字符串空格

//        emit appendText(line);  // 使用信号来添加文本

        if(line.contains(u8"任务代号:", Qt::CaseSensitive))
        {
    
    
            int pos = line.lastIndexOf(":");
            QString taskNum = line.right(line.size() - pos - 2);
            taskNum = taskNum.trimmed();
            m_taskNumSet.insert(taskNum);
//            break;
        }

//        m_paraText->setFont(QFont("Microsoft YaHei"));
//        m_paraText->setFont(QFont("Microsoft YaHei")); // 使用"Microsoft YaHei"字体

        m_paraText->append(line); // 添加到QTextEdit控件中
    }

    file.close();
}

Summarize

In Qt, the QFile class itself does not provide a method to directly obtain the file encoding format. File encoding is an attribute of file content, and QFile only provides functions for reading and writing files. To get the encoding format of the file, you can use other libraries or methods to analyze the file content and infer the encoding format.

A common method is to use a third-party library such as uchardet or libmagic to detect the file's encoding. These libraries can analyze the characteristics of the file content to guess its possible encoding format. You can read file contents into memory and use these libraries for encoding detection.

Here is an example using the uchardet library to detect file encodings:

Integrate the uchardet library in the Qt project, you can use CMake or manually compile the library.
Introduce the header file and link library of uchardet in the code of Qt project.
Use QFile to read the file content and pass it to uchardet to detect the encoding format.

#include <uchardet/uchardet.h>

QString detectFileEncoding(const QString& filePath)
{
    
    
    QFile file(filePath);
    if (!file.open(QIODevice::ReadOnly)) {
    
    
        qDebug() << "Failed to open file:" << file.errorString();
        return QString();
    }

    QByteArray data = file.readAll();
    uchardet_t ud = uchardet_new();
    uchardet_handle_data(ud, data.constData(), data.size());
    uchardet_data_end(ud);
    const char* encoding = uchardet_get_charset(ud);
    QString detectedEncoding = QString::fromLatin1(encoding);
    uchardet_delete(ud);

    file.close();

    return detectedEncoding;
}

void ProjectWin::readFileAndDetectEncoding(const QString& filePath)
{
    
    
    QString detectedEncoding = detectFileEncoding(filePath);
    qDebug() << "Detected Encoding:" << detectedEncoding;

    QFile file(filePath);
    if (!file.open(QIODevice::ReadOnly)) {
    
    
        qDebug() << "Failed to open file:" << file.errorString();
        return;
    }

    QTextStream in(&file);
    in.setCodec(detectedEncoding.toUtf8()); // 设置检测到的编码

    QString content = in.readAll();

    file.close();

    // 处理读取到的文件内容...
}

The detectFileEncoding() function in the above code uses the uchardet library to detect the encoding format of the file and returns the detected encoding string. Then, in the readFileAndDetectEncoding() function, use the detected encoding to set the encoding of the QTextStream to read the file content correctly.

Please note that uchardet is an independent third-party library and does not come with Qt. You need to import and properly configure building and linking of this library in your project.

In addition, it should be noted that automatic detection of encoding is not always 100% accurate, especially for some special or mixed encoding files, there may be misjudgment. Therefore, it is best to know the exact encoding format of the file in advance, or agree on the encoding method of the file in advance.

Guess you like

Origin blog.csdn.net/aoxuestudy/article/details/131109062