26- OCR LCD screen reading recognition based on PP-OCRv3

Main points:


1 Introduction

This project is based on the PaddleOCR open source suite, based on the PP-OCRv3 detection and recognition model, and optimized for the LCD screen reading recognition scene. Mainly for the identification of various instruments:

2 Installation environment

Install Git: Git detailed installation tutorial

# 首先git官方的PaddleOCR项目,安装需要的依赖
# 第一次运行打开该注释
# git clone https://gitee.com/PaddlePaddle/PaddleOCR.git
cd PaddleOCR
pip install -r requirements.txt

3. Text detection, PP-OCRv3 detection algorithm introduction

The PP-OCRv3 detection model is an upgrade of the CML (Collaborative Mutual Learning) collaborative mutual learning text detection distillation strategy in PP-OCRv2. As shown in the figure below, the core idea of ​​CML combines ① traditional teacher-guided student standard distillation and ② DML mutual learning between the students network, which allows the student network to learn from each other and the teacher network to give guidance. PP-OCRv3 further optimizes the effect of teacher model and student model respectively. Among them, when optimizing the teacher model, the PAN structure LK-PAN with a large receptive field is proposed and the DML (Deep Mutual Learning) distillation strategy is introduced; when optimizing the student model, the FPN structure RSE of the residual attention mechanism is proposed -FPN.

3.1 Data preparation

The data of the measurement equipment screen character detection data set comes from the digital display screens of various measurement equipment in actual projects, as well as some other digital display screens collected on the Internet, including 755 training sets and 355 test sets .

# 在PaddleOCR下创建新的文件夹train_data
mkdir train_data
# 下载数据集并解压到指定路径下
unzip icdar2015.zip  -d train_data
# 随机查看文字检测数据集图片
from PIL import Image  
import matplotlib.pyplot as plt
import numpy as np
import os


train = './train_data/icdar2015/text_localization/test'
# 从指定目录中选取一张图片
def get_one_image(train):
    plt.figure()
    files = os.listdir(train)
    n = len(files)
    ind = np.random.randint(0,n)
    img_dir = os.path.join(train,files[ind])  
    image = Image.open(img_dir)  
    plt.imshow(image)
    plt.show()
    image = image.resize([208, 208])  

get_one_image(train)  

3.2 Model training

3.2.1 Direct evaluation of pre-trained models

Download the PP-OCRv3 detection pre-training model we need. For more choices, please choose other text detection models by yourself

#使用该指令下载需要的预训练模型
wget -P ./pretrained_models/ https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar
# 解压预训练模型文件
tar -xf ./pretrained_models/ch_PP-OCRv3_det_distill_train.tar -C pretrained_models

Before training, we can directly use the following command to evaluate the effect of the pre-trained model:

# 评估预训练模型
python tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml -o Global.pretrained_model="./pretrained_models/ch_PP-OCRv3_det_distill_train/best_accuracy"

 Don't change the parameters randomly, otherwise it won't work, the result is as follows:

3.2.2 Pre-training model direct finetune

Modify the configuration file

We use configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml to mainly modify the number of training rounds and learning rate parameters, set the pre-training model path, and set the data set path. In addition, batch_size can be adjusted according to the size of the video memory of your own machine. The specific changes are as follows:

epoch:100
save_epoch_step:10
eval_batch_step:[0, 50]
save_model_dir: ./output/ch_PP-OCR_v3_det/
pretrained_model: ./pretrained_models/ch_PP-OCRv3_det_distill_train/best_accuracy
learning_rate: 0.00025
num_workers: 0 # 如果单卡训练,建议将Train和Eval的loader部分的num_workers设置为0,否则会出现`/dev/shm insufficient`的报错

start training

Using the configuration file configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml we modified above, the training command is as follows:

# 开始训练模型
python tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml -o Global.pretrained_model=./pretrained_models/ch_PP-OCRv3_det_distill_train/best_accuracy

Evaluate the trained model:

# 评估训练好的模型
python tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml -o Global.pretrained_model="./output/ch_PP-OCR_v3_det/best_accuracy"

3.2.3 Based on the pre-trained model Finetune_student model

We use configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml to mainly modify the number of training rounds and learning rate parameters, set the pre-training model path, and set the data set path. In addition, batch_size can be adjusted according to the size of the video memory of your own machine. The specific changes are as follows:

epoch:100
save_epoch_step:10
eval_batch_step:[0, 50]
save_model_dir: ./output/ch_PP-OCR_v3_det_student/
pretrained_model: ./pretrained_models/ch_PP-OCRv3_det_distill_train/student
learning_rate: 0.00025
num_workers: 0 # 如果单卡训练,建议将Train和Eval的loader部分的num_workers设置为0,否则会出现`/dev/shm insufficient`的报错

The training command is as follows:

python tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml -o Global.pretrained_model=./pretrained_models/ch_PP-OCRv3_det_distill_train/student

Evaluate the trained model:

# 评估训练好的模型
python tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml -o Global.pretrained_model="./output/ch_PP-OCR_v3_det_student/best_accuracy"

3.2.4 Based on the pre-training model Finetune_teacher model

First, you need to extract the teacher parameters from the provided pre-training model best_accuracy.pdparams, and combine them into an initialization model suitable for dml training. The extraction code is as follows:

cd ./pretrained_models/
# transform teacher params in best_accuracy.pdparams into teacher_dml.paramers
import paddle

# load pretrained model
all_params = paddle.load("ch_PP-OCRv3_det_distill_train/best_accuracy.pdparams")
# print(all_params.keys())

# keep teacher params
t_params = {key[len("Teacher."):]: all_params[key] for key in all_params if "Teacher." in key}

# print(t_params.keys())

s_params = {"Student." + key: t_params[key] for key in t_params}
s2_params = {"Student2." + key: t_params[key] for key in t_params}
s_params = {**s_params, **s2_params}
# print(s_params.keys())

paddle.save(s_params, "ch_PP-OCRv3_det_distill_train/teacher_dml.pdparams")

We use configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml to mainly modify the number of training rounds and learning rate parameters, set the pre-training model path, and set the data set path. In addition, batch_size can be adjusted according to the size of the video memory of your own machine. The specific changes are as follows:

epoch:100
save_epoch_step:10
eval_batch_step:[0, 50]
save_model_dir: ./output/ch_PP-OCR_v3_det_teacher/
pretrained_model: ./pretrained_models/ch_PP-OCRv3_det_distill_train/teacher_dml
learning_rate: 0.00025
num_workers: 0 # 如果单卡训练,建议将Train和Eval的loader部分的num_workers设置为0,否则会出现`/dev/shm insufficient`的报错

The training command is as follows:

python tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml -o Global.pretrained_model=./pretrained_models/ch_PP-OCRv3_det_distill_train/teacher_dml

Evaluate the trained model:

# 评估训练好的模型
python tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml -o Global.pretrained_model="./output/ch_PP-OCR_v3_det_teacher/best_accuracy"

 3.2.5 Using CML distillation to further improve the accuracy of the student model

It is necessary to extract the parameters representing the student and the teacher from the best_accuracy.pdparams trained in 4.3.3 and 4.3.4, and combine them into an initialization model suitable for cml training. The extraction code is as follows:

# transform teacher params and student parameters into cml model
import paddle

all_params = paddle.load("./pretrained_models/ch_PP-OCRv3_det_distill_train/best_accuracy.pdparams")
# print(all_params.keys())

t_params = paddle.load("./output/ch_PP-OCR_v3_det_teacher/best_accuracy.pdparams")
# print(t_params.keys())

s_params = paddle.load("./output/ch_PP-OCR_v3_det_student/best_accuracy.pdparams")
# print(s_params.keys())

for key in all_params:
    # teacher is OK
    if "Teacher." in key:
        new_key = key.replace("Teacher", "Student")
        #print("{} >> {}\n".format(key, new_key))
        assert all_params[key].shape == t_params[new_key].shape
        all_params[key] = t_params[new_key]

    if "Student." in key:
        new_key = key.replace("Student.", "")
        #print("{} >> {}\n".format(key, new_key))
        assert all_params[key].shape == s_params[new_key].shape
        all_params[key] = s_params[new_key]

    if "Student2." in key:
        new_key = key.replace("Student2.", "")
        print("{} >> {}\n".format(key, new_key))
        assert all_params[key].shape == s_params[new_key].shape
        all_params[key] = s_params[new_key]

paddle.save(all_params, "./pretrained_models/ch_PP-OCRv3_det_distill_train/teacher_cml_student.pdparams")

The training command is as follows:

python tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml -o Global.pretrained_model=./pretrained_models/ch_PP-OCRv3_det_distill_train/teacher_cml_student Global.save_model_dir=./output/ch_PP-OCR_v3_det_finetune/

Evaluate the trained model:

# 评估训练好的模型
python tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml -o Global.pretrained_model="./output/ch_PP-OCR_v3_det_finetune/best_accuracy"

3.2.6 Model derived reasoning

After the training is complete, the training model can be converted into an inference model. The inference model will additionally save the structural information of the model, which has superior performance in predicting deployment and accelerating reasoning, is flexible and convenient, and is suitable for actual system integration.

3.3.6.1 Model export

The export command is as follows:

# 转化为推理模型
python tools/export_model.py \
-c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml \
-o Global.pretrained_model=./output/ch_PP-OCR_v3_det_finetune/best_accuracy \
-o Global.save_inference_dir="./inference/det_ppocrv3"

3.3.6.2 Model Inference

After exporting the model, you can use the following commands to make inference predictions:

# 推理预测
python tools/infer/predict_det.py --image_dir="train_data/icdar2015/text_localization/test/1.jpg" --det_model_dir="./inference/det_ppocrv3/Student"

4 Text recognition

The task of text recognition is to recognize the text content in the image . Generally, the input comes from the image text area intercepted by the text box obtained by text detection. Text recognition can generally be divided into two categories: regular text recognition and irregular text recognition according to the shape of the text to be recognized. Regular text mainly refers to printed fonts, scanned text, etc., and the text is roughly in the horizontal position; irregular text is often not in the horizontal position, and there are problems such as bending, occlusion, and blurring. Irregular text scenes are very challenging and are currently the main research direction in the field of text recognition. This project is optimized based on the PP-OCRv3 algorithm .

4.1 Introduction to PP-OCRv3 recognition algorithm

The recognition module of PP-OCRv3 is optimized based on the text recognition algorithm SVTR . SVTR no longer uses the RNN structure, and introduces the Transformers structure to more effectively mine the context information of the text line image, thereby improving the text recognition ability. As shown in the figure below, PP-OCRv3 uses 6 optimization strategies.

The optimization strategies are summarized as follows:

  • SVTR_LCNet: Lightweight Text Recognition Network
  • GTC: Attention guides CTC training strategy
  • TextConAug: A Data Augmentation Strategy for Mining Text Context Information
  • TextRotNet: A self-supervised pre-trained model
  • UDML: United Mutual Learning Strategies
  • UIM: An Unlabeled Data Mining Solution

4.2 Data preparation

Measurement equipment screen character recognition data set data comes from the digital display screens of various measurement equipment in actual projects, as well as some other digital display screens collected on the Internet, including 19912 training sets and 4099 test sets.

# 解压下载的数据集到指定路径下
unzip ic15_data.zip -d train_data
# 随机查看文字检测数据集图片
from PIL import Image  
import matplotlib.pyplot as plt
import numpy as np
import os

train = './train_data/ic15_data/train'
# 从指定目录中选取一张图片
def get_one_image(train):
    plt.figure()
    files = os.listdir(train)
    n = len(files)
    ind = np.random.randint(0,n)
    img_dir = os.path.join(train,files[ind])  
    image = Image.open(img_dir)  
    plt.imshow(image)
    plt.show()
    image = image.resize([208, 208])  

get_one_image(train)

4.3 Model training

Download the pretrained model

Download the PP-OCRv3 recognition pre-training model we need. For more choices, please choose other text recognition models by yourself

# 使用该指令下载需要的预训练模型
wget -P ./pretrained_models/ https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_train.tar
# 解压预训练模型文件
tar -xf ./pretrained_models/ch_PP-OCRv3_rec_train.tar -C pretrained_models

Modify the configuration file

We use configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml to mainly modify the number of training rounds and learning rate parameters, set the pre-training model path, and set the data set path. In addition, batch_size can be adjusted according to the size of the video memory of your own machine. The specific changes are as follows:

  epoch_num: 100 # 训练epoch数
  save_model_dir: ./output/ch_PP-OCR_v3_rec
  save_epoch_step: 10
  eval_batch_step: [0, 100] # 评估间隔,每隔100step评估一次
  cal_metric_during_train: true
  pretrained_model: ./pretrained_models/ch_PP-OCRv3_rec_train/best_accuracy  # 预训练模型路径
  character_dict_path: ppocr/utils/ppocr_keys_v1.txt
  use_space_char: true  # 使用空格

  lr:
    name: Cosine # 修改学习率衰减策略为Cosine
    learning_rate: 0.0002 # 修改fine-tune的学习率
    warmup_epoch: 2 # 修改warmup轮数

Train:
  dataset:
    name: SimpleDataSet
    data_dir: ./train_data/ic15_data/ # 训练集图片路径
    ext_op_transform_idx: 1
    label_file_list:
    - ./train_data/ic15_data/rec_gt_train.txt # 训练集标签
    ratio_list:
    - 1.0
  loader:
    shuffle: true
    batch_size_per_card: 64
    drop_last: true
    num_workers: 4
Eval:
  dataset:
    name: SimpleDataSet
    data_dir: ./train_data/ic15_data/ # 测试集图片路径
    label_file_list:
    - ./train_data/ic15_data/rec_gt_test.txt # 测试集标签
    ratio_list:
    - 1.0
  loader:
    shuffle: false
    drop_last: false
    batch_size_per_card: 64
    num_workers: 4

Before training, we can directly use the following command to evaluate the effect of the pre-trained model:

# 评估预训练模型
python tools/eval.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml -o Global.pretrained_model="./pretrained_models/ch_PP-OCRv3_rec_train/best_accuracy"

start training

We use the configuration file configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml modified above. After the pre-training model, data set path, learning rate, number of training rounds, etc. have been set, you can use the following command to start training.

# 开始训练识别模型
python tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml

 After the training is complete, the best of the trained models can be tested, and the evaluation command is as follows:

# 评估finetune效果
python tools/eval.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml -o Global.checkpoints="./output/ch_PP-OCR_v3_rec/best_accuracy"

4.4 Model derived reasoning

After the training is complete, the training model can be converted into an inference model. The inference model will additionally save the structural information of the model, which has superior performance in predicting deployment and accelerating reasoning, is flexible and convenient, and is suitable for actual system integration.

Model export

The export command is as follows:

# 转化为推理模型
python tools/export_model.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml -o Global.pretrained_model="./output/ch_PP-OCR_v3_rec/best_accuracy" Global.save_inference_dir="./inference/rec_ppocrv3/"

model reasoning

After exporting the model, you can use the following command for inference prediction

# 推理预测
python tools/infer/predict_rec.py --image_dir="train_data/ic15_data/test/1_crop_0.jpg" --rec_model_dir="./inference/rec_ppocrv3/Student"

5 systems connected in series

We will test the detection and recognition model trained above in series with the system, the command is as follows:

#串联测试
python3 tools/infer/predict_system.py --image_dir="./train_data/icdar2015/text_localization/test/142.jpg" --det_model_dir="./inference/det_ppocrv3/Student"  --rec_model_dir="./inference/rec_ppocrv3/Student"

The test results are saved in ./inference_results/the directory and can be visualized with the following code

%cd /home/aistudio/PaddleOCR
# 显示结果
import matplotlib.pyplot as plt
from PIL import Image
img_path= "./inference_results/142.jpg"
img = Image.open(img_path)
plt.figure("test_img", figsize=(30,30))
plt.imshow(img)
plt.show()

5.1 Post-processing

If you need to obtain key-value information, you can match the recognition result with the keyword library based on heuristic rules; if it matches, take this field as key, and the next field as value.

def postprocess(rec_res):
    keys = ["型号", "厂家", "版本号", "检定校准分类", "计量器具编号", "烟尘流量",
            "累积体积", "烟气温度", "动压", "静压", "时间", "试验台编号", "预测流速",
            "全压", "烟温", "流速", "工况流量", "标杆流量", "烟尘直读嘴", "烟尘采样嘴",
            "大气压", "计前温度", "计前压力", "干球温度", "湿球温度", "流量", "含湿量"]
    key_value = []
    if len(rec_res) > 1:
        for i in range(len(rec_res) - 1):
            rec_str, _ = rec_res[i]
            for key in keys:
                if rec_str in key:
                    key_value.append([rec_str, rec_res[i + 1][0]])
                    break
    return key_value
key_value = postprocess(filter_rec_res)

6 Paddle Serving Deployment

First, you need to install the environment related to PaddleServing deployment

python -m pip install paddle-serving-server-gpu
python -m pip install paddle_serving_client
python -m pip install paddle-serving-app

6.1 Conversion detection model

cd deploy/pdserving/
python -m paddle_serving_client.convert --dirname ../../inference/det_ppocrv3/Student/  \
                                         --model_filename inference.pdmodel          \
                                         --params_filename inference.pdiparams       \
                                         --serving_server ./ppocr_det_v3_serving/ \
                                         --serving_client ./ppocr_det_v3_client/

6.2 Conversion recognition model

python -m paddle_serving_client.convert --dirname ../../inference/rec_ppocrv3/Student \
                                         --model_filename inference.pdmodel          \
                                         --params_filename inference.pdiparams       \
                                         --serving_server ./ppocr_rec_v3_serving/ \
                                         --serving_client ./ppocr_rec_v3_client/

6.3 Start the service

First, the post-processing code can be added to web_service.py, and the specific modifications are as follows:

# 代码153行后面增加下面代码
def _postprocess(rec_res):
    keys = ["型号", "厂家", "版本号", "检定校准分类", "计量器具编号", "烟尘流量",
            "累积体积", "烟气温度", "动压", "静压", "时间", "试验台编号", "预测流速",
            "全压", "烟温", "流速", "工况流量", "标杆流量", "烟尘直读嘴", "烟尘采样嘴",
            "大气压", "计前温度", "计前压力", "干球温度", "湿球温度", "流量", "含湿量"]
    key_value = []
    if len(rec_res) > 1:
        for i in range(len(rec_res) - 1):
            rec_str, _ = rec_res[i]
            for key in keys:
                if rec_str in key:
                    key_value.append([rec_str, rec_res[i + 1][0]])
                    break
    return key_value
key_value = _postprocess(rec_list)
res = {"result": str(key_value)}
# res = {"result": str(result_list)}

Start the server

python web_service.py 2>&1 >log.txt

6.4 Send request

Then open a new terminal and run the following client code

python pipeline_http_client.py --image_dir ../../train_data/icdar2015/text_localization/test/142.jpg

The final key-value result can be obtained:

大气压, 100.07kPa
干球温度, 0000℃
计前温度, 0000℃
湿球温度, 0000℃
计前压力, -0000kPa
流量, 00.0L/min
静压, 00000kPa
含湿量, 00.0 %

Guess you like

Origin blog.csdn.net/March_A/article/details/130374103