paddleocr model training process record

Main points:


A text detection module

1.0.1 Evaluation using the original pre-trained model

python tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml -o Global.pretrained_model="./pretrained_models/ch_PP-OCRv3_det_distill_train/best_accuracy"

This is the command line code for the PaddleOCR evaluation model, the specific meaning is as follows:

python tools/eval.py: Run the eval.py script in the PaddleOCR tool directory.

-c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml: Use the model parameters in the ch_PP-OCRv3_det_cml.yml configuration file for model evaluation.

-o Global.pretrained_model="./pretrained_models/ch_PP-OCRv3_det_distill_train/best_accuracy": Specifies to use the pretrained model saved during training for evaluation. Among them , Global.pretrained_model specifies the path of the pretrained model, and "./pretrained_models/ch_PP-OCRv3_det_distill_train/best_accuracy" indicates the storage location of the pretrained model .

In short, this command will evaluate PaddleOCR's object detection model through the specified configuration file and pre-trained model , so as to get its performance indicators on the test data set.

1.1 CML optimization based on PP-OCRv3 detection pre-training model

python tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml -o Global.pretrained_model=./pretrained_models/ch_PP-OCRv3_det_distill_train/best_accuracy
  • python: Run the Python interpreter.
  • tools/train.py: The script for model training provided by PaddleOCR .
  • -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml: Specifies the training task name and configuration file to use . configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml It is a training configuration file for Chinese detection tasks, which includes the definition of the model, the settings of the optimizer, the path of the training data set , etc.
  • -o Global.pretrained_model=./pretrained_models/ch_PP-OCRv3_det_distill_train/best_accuracy: The path setting of the pre-trained model for the model, which is an optional parameter. Global.pretrained_model=./pretrained_models/ch_PP-OCRv3_det_distill_train/best_accuracy Indicates that the  path of the pre-trained model is set./pretrained_models/ch_PP-OCRv3_det_distill_train/best_accuracy .

In summary, executing this command configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml will read the training configuration of the Chinese detection task , and initialize the network weights with the pre-trained weight parameters./pretrained_models/ch_PP-OCRv3_det_distill_train/best_accuracy  in the first training  .

1.2.1 Evaluate the effect of the previous optimization

python tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml -o Global.pretrained_model="./output/ch_PP-OCR_v3_det/best_accuracy"
  • The number of runs in the first step is relatively small, and the effect is average. The official statement is that the running rate is about 47.5% -> 65.2%

1.2 Fintune optimization based on the student model detected by PP-OCRv3

python tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml -o Global.pretrained_model=./pretrained_models/ch_PP-OCRv3_det_distill_train/student

python tools/train.py: Run the train.py script in the PaddleOCR tool directory.

-c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml: Use the model parameters in the ch_PP-OCRv3_det_student.yml configuration file for training. This configuration file sets the use of depthwise separable convolution (depthwise separable convolution) instead of regular convolution, thereby reducing the size of the model and speeding up training and inference .

-o Global.pretrained_model=./pretrained_models/ch_PP-OCRv3_det_distill_train/student: Specifies to use the pre-trained model for model optimization. Among them, Global.pretrained_model specifies the path of the pretrained model, and "./pretrained_models/ch_PP-OCRv3_det_distill_train/student" indicates the storage location of the pretrained model .

In short, this command will use the specified configuration and pre-trained model to train the PaddleOCR detection model , and finally get the trained object detection model. During the training process, techniques such as deep separable convolution and knowledge distillation are used to further improve the performance indicators of the model.

1.2.1 Training Effect Evaluation

python tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml -o Global.pretrained_model="./output/ch_PP-OCR_v3_det_student/best_accuracy"
  • 3080Ti ran 800 times for a day, the effect is not bad, 58% -> 87%

1.3 Fintune optimization based on teacher model detected by PP-OCRv3

First, you need to extract the teacher parameters from the provided pre-training model best_accuracy.pdparams , and combine them into an initialization model suitable for dml training. The extraction code is as follows:

%cd /home/aistudio/PaddleOCR/pretrained_models/
# transform teacher params in best_accuracy.pdparams into teacher_dml.paramers
import paddle

# load pretrained model
all_params = paddle.load("ch_PP-OCRv3_det_distill_train/best_accuracy.pdparams")
# print(all_params.keys())

# keep teacher params
t_params = {key[len("Teacher."):]: all_params[key] for key in all_params if "Teacher." in key}

# print(t_params.keys())

s_params = {"Student." + key: t_params[key] for key in t_params}
s2_params = {"Student2." + key: t_params[key] for key in t_params}
s_params = {**s_params, **s2_params}
# print(s_params.keys())

paddle.save(s_params, "ch_PP-OCRv3_det_distill_train/teacher_dml.pdparams")

This code implements a technology of knowledge distillation (Knowledge Distillation), which is to transfer the knowledge of a larger, higher-precision model (ie, "Teacher") to a smaller, slightly lower-precision model (ie, " Teacher") Student" and "Student2").

1.3.1 Executing training

python tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml -o Global.pretrained_model=./pretrained_models/ch_PP-OCRv3_det_distill_train/teacher_dml
  • -c: Specifies the path of the configuration file, indicating that the PP-OCRv3 model is used for detection and the dml (distillation multi-task learning) method is used for training.configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml
  • -o: Specify hyperparameters, indicating that the model in the directory is used as the pre-training model.Global.pretrained_model=./pretrained_models/ch_PP-OCRv3_det_distill_train/teacher_dmlpretrained_models/ch_PP-OCRv3_det_distill_trainteacher_dml

Specifically, the code first reads the specified configuration file, and then starts model training with the specified hyperparameters. During the training process, operations such as adjusting the learning rate, saving the model, and printing logs will be performed automatically. By training the model, a model that can be used for text detection can be obtained.

1.3.2 Model Evaluation

python tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml -o Global.pretrained_model="./output/ch_PP-OCR_v3_det_teacher/best_accuracy"
  • 200 runs in one day, -> 85.9%

1.4 CML optimization based on fintune's good student and teacher models

It is necessary to extract the parameters representing the student and the teacher from the best_accuracy.pdparams trained in 4.2 and 4.3 , and combine them into an initialization model suitable for cml training. The extraction code is as follows:

%cd /home/aistudio/PaddleOCR/
# transform teacher params and student parameters into cml model
import paddle

all_params = paddle.load("./pretrained_models/ch_PP-OCRv3_det_distill_train/best_accuracy.pdparams")
# print(all_params.keys())

t_params = paddle.load("./output/ch_PP-OCR_v3_det_teacher/best_accuracy.pdparams")
# print(t_params.keys())

s_params = paddle.load("./output/ch_PP-OCR_v3_det_student/best_accuracy.pdparams")
# print(s_params.keys())

for key in all_params: 
    # teacher is OK
    if "Teacher." in key:
        new_key = key.replace("Teacher", "Student")
        #print("{} >> {}\n".format(key, new_key))
        assert all_params[key].shape == t_params[new_key].shape
        all_params[key] = t_params[new_key]

    if "Student." in key:
        new_key = key.replace("Student.", "")
        #print("{} >> {}\n".format(key, new_key))
        assert all_params[key].shape == s_params[new_key].shape
        all_params[key] = s_params[new_key]

    if "Student2." in key:
        new_key = key.replace("Student2.", "")
        print("{} >> {}\n".format(key, new_key))
        assert all_params[key].shape == s_params[new_key].shape
        all_params[key] = s_params[new_key]
        
paddle.save(all_params, "./pretrained_models/ch_PP-OCRv3_det_distill_train/teacher_cml_student.pdparams")

1.4.1 Executing training

python tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml -o Global.pretrained_model=./pretrained_models/ch_PP-OCRv3_det_distill_train/teacher_cml_student Global.save_model_dir=./output/ch_PP-OCR_v3_det_finetune/

1.4.2 Perform assessment

python tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml -o Global.pretrained_model="./output/ch_PP-OCR_v3_det_finetune/best_accuracy"

1.5 Model Reasoning

After the training is complete , the training model can be converted into an inference model . The inference model will additionally save the structural information of the model, which has superior performance in predicting deployment and accelerating reasoning, is flexible and convenient, and is suitable for actual system integration.

1.5.1 Export model

%cd /home/aistudio/PaddleOCR
# 转化为推理模型
!python tools/export_model.py \
-c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml \
-o Global.pretrained_model=./output/ch_PP-OCR_v3_det_finetune/best_accuracy \
-o Global.save_inference_dir="./inference/det_ppocrv3"

1.5.2 Inference prediction

%cd /home/aistudio/PaddleOCR
# 推理预测
!python tools/infer/predict_det.py --image_dir="train_data/icdar2015/text_localization/test/1.jpg" --det_model_dir="./inference/det_ppocrv3/Student"

This code is used for inference of text detection, where --image_dirspecify the path of the input image and --det_model_dirthe path of the trained text detection model.

Specifically, the path of a single image to be inferred is specified , which can be an image or a folder path, and all images inside it will be automatically detected, and the path of the trained text detection model is specified . The reasoning process will use the specified model to perform text detection on the image and output the detection result.--image_dir --det_model_dir

2. Optimization for text recognition

2.0.1  Download the pre-trained model

Download the required PP-OCRv3 recognition pre-training model, for more options, please download other text recognition models by yourself

%cd /home/aistudio/PaddleOCR
# 使用该指令下载需要的预训练模型
!wget -P ./pretrained_models/ https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_train.tar
# 解压预训练模型文件
!tar -xf ./pretrained_models/ch_PP-OCRv3_rec_train.tar -C pretrained_models

2.0.2 Evaluation of the original model

python tools/eval.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml -o Global.pretrained_model="./pretrained_models/ch_PP-OCRv3_rec_train/best_accuracy"
  • python: Start the Python interpreter
  • tools/eval.py: Execute the script in PaddleOCR eval.py, which is used to evaluate the performance of the model
  • -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml: Specifies the path of the model configuration file used , which is a knowledge distillation model based on the PP-OCRv3 model structure , where it recindicates the text recognition model, PP-OCRv3indicates that the basic model is PP-OCRv3, and ch_PP-OCRv3_rec_distillation.ymlindicates the configuration file name of the model
  • -o Global.pretrained_model="./pretrained_models/ch_PP-OCRv3_rec_train/best_accuracy": Specifies the pre-training parameters of the model , that is, the weight file used, Global.pretrained_modelindicating the global parameters pretrained_model, "./pretrained_models/ch_PP-OCRv3_rec_train/best_accuracy"indicating the path of the weight file
  • After executing the above command, the program will load the model configuration and weight files , identify the specified test set, and output the performance indicators of the model.

2.1 Recognition model optimization

2.1.1 Correction parameters

  epoch_num: 100 # 训练epoch数
  save_model_dir: ./output/ch_PP-OCR_v3_rec
  save_epoch_step: 10
  eval_batch_step: [0, 100] # 评估间隔,每隔100step评估一次
  cal_metric_during_train: true
  pretrained_model: ./pretrained_models/ch_PP-OCRv3_rec_train/best_accuracy  # 预训练模型路径
  character_dict_path: ppocr/utils/ppocr_keys_v1.txt
  use_space_char: true  # 使用空格

  lr:
    name: Cosine # 修改学习率衰减策略为Cosine
    learning_rate: 0.0002 # 修改fine-tune的学习率
    warmup_epoch: 2 # 修改warmup轮数

Train:
  dataset:
    name: SimpleDataSet
    data_dir: ./train_data/ic15_data/ # 训练集图片路径
    ext_op_transform_idx: 1
    label_file_list:
    - ./train_data/ic15_data/rec_gt_train.txt # 训练集标签
    ratio_list:
    - 1.0
  loader:
    shuffle: true
    batch_size_per_card: 64
    drop_last: true
    num_workers: 4
Eval:
  dataset:
    name: SimpleDataSet
    data_dir: ./train_data/ic15_data/ # 测试集图片路径
    label_file_list:
    - ./train_data/ic15_data/rec_gt_test.txt # 测试集标签
    ratio_list:
    - 1.0
  loader:
    shuffle: false
    drop_last: false
    batch_size_per_card: 64
    num_workers: 4

2.1.2 Executing training

python tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml

Specifically, train.pyit is a command-line tool that can be used to train OCR models on PaddlePaddle. The script is written in Python language, and mainly calls the API of PaddlePaddle to realize the training process. Its input parameters include the configuration file path ( -c ), and some other optional parameters ( -o ). In this example, a configuration file is used ch_PP-OCRv3_rec_distillation.ymlto specify training hyperparameters, data path, network structure and other information.

The function of this command is to start the training process of an OCR model based on the above configuration file . During the training process, the model will continuously adjust its parameters according to the training data to improve the accuracy on the verification data.

2.1.3 Model Evaluation

python tools/eval.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml -o Global.checkpoints="./output/ch_PP-OCR_v3_rec/best_accuracy"
  • tools/eval.py: The path of the Python script to run, here is the evaluation of the text recognition model.
  • -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml: Specifies the configuration file to use, here is the distillation configuration file of the PP-OCRv3 text recognition model.
  • -o Global.checkpoints="./output/ch_PP-OCR_v3_rec/best_accuracy": Specifies the model path to load during evaluation, here is the model with the best accuracy on the validation set saved during training .
  • Run 360 times the effect

 

2.2 Model reasoning

2.2.1 Export model

After the training is complete, the training model can be converted into an inference model . The inference model will additionally save the structural information of the model, which has superior performance in predicting deployment and accelerating reasoning, is flexible and convenient, and is suitable for actual system integration.

python tools/export_model.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml -o Global.pretrained_model="./output/ch_PP-OCR_v3_rec/best_accuracy" Global.save_inference_dir="./inference/rec_ppocrv3/"
  • -c: Specify the OCR model configuration file path ;
  • -o Global.pretrained_model: Specify the path of the OCR model to be exported ;
  • -o Global.save_inference_dir: Specify the save path of the exported model .

Specifically, the command uses the model configuration specified in the configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.ymlconfiguration file to export the model ./inference/rec_ppocrv3/to the directory using the best model ./output/ch_PP-OCR_v3_rec/best_accuracyin .

2.2.2 Model reasoning

python tools/infer/predict_rec.py --image_dir="train_data/ic15_data/test/1_crop_0.jpg" --rec_model_dir="./inference/rec_ppocrv3/Student"

This code uses the script in the PaddleOCR toolkit for text recognition (Text Recognition) prediction. Specifically, it will load a picture, use the specified text recognition model to recognize the text in the picture , and finally output the recognition result. predict_rec.py

  • --image_dir: the image path to be identified;
  • --rec_model_dir: Text recognition model path.

Triple combination detection and recognition model

3.1 Test the detection and recognition models trained above in series

python3 tools/infer/predict_system.py --image_dir="./train_data/icdar2015/text_localization/test/142.jpg" --det_model_dir="./inference/det_ppocrv3/Student"  --rec_model_dir="./inference/rec_ppocrv3/Student"
  • tools/infer/predict_system.py: Script to run OCR model inference .
  • --image_dir="./train_data/icdar2015/text_localization/test/142.jpg": The directory where the image to be recognized is located.
  • --det_model_dir="./inference/det_ppocrv3/Student": The directory where the text detection model is located.
  • --rec_model_dir="./inference/rec_ppocrv3/Student": The directory where the text recognition model is located.

The test results are saved in the directory and can be visualized with the following code./inference_results/


%cd /home/aistudio/PaddleOCR
# 显示结果
import matplotlib.pyplot as plt
from PIL import Image
img_path= "./inference_results/142.jpg"
img = Image.open(img_path)
plt.figure("test_img", figsize=(30,30))
plt.imshow(img)
plt.show()

3.2  Post-processing

If you need to obtain key-value information , you can match the recognition result with the keyword library based on heuristic rules ; if it matches, take this field as key, and the next field as value .

def postprocess(rec_res):
    keys = ["型号", "厂家", "版本号", "检定校准分类", "计量器具编号", "烟尘流量",
            "累积体积", "烟气温度", "动压", "静压", "时间", "试验台编号", "预测流速",
            "全压", "烟温", "流速", "工况流量", "标杆流量", "烟尘直读嘴", "烟尘采样嘴",
            "大气压", "计前温度", "计前压力", "干球温度", "湿球温度", "流量", "含湿量"]
    key_value = []
    if len(rec_res) > 1:
        for i in range(len(rec_res) - 1):
            rec_str, _ = rec_res[i]
            for key in keys:
                if rec_str in key:
                    key_value.append([rec_str, rec_res[i + 1][0]])
                    break
    return key_value
key_value = postprocess(filter_rec_res)

Guess you like

Origin blog.csdn.net/March_A/article/details/130442379