PaddleOCR, image detection and recognition

Introduction

        OCR (optical character recognition) character recognition refers to the process in which electronic devices (such as scanners or digital cameras) check characters printed on paper, and then use character recognition methods to translate shapes into computer characters; that is, text materials are scanned, and then The process of analyzing and processing image files to obtain text and layout information. How to debug or use auxiliary information to improve the accuracy of recognition is the most important topic of OCR. The main indicators to measure the performance of an OCR system are: rejection rate, false recognition rate, recognition speed, user interface friendliness, product stability, usability and feasibility, etc.

Installation Tutorial

https://github.com/PaddlePaddle/PaddleOCR/blob/static/doc/doc_ch/installation.mdhttps://github.com/PaddlePaddle/PaddleOCR/blob/static/doc/doc_ch/installation.md

Framework

1. configs folder

Under this folder are configuration files for orientation classification, text detection, and text recognition. The content described in the configuration file includes how to train, model structure, optimizer, training parameters, training data source, etc.

2. Deploy folder
This folder is mainly related to deployment, and it is also the last place to learn, which can be skipped temporarily.

3. doc folder
This folder includes some test pictures. There is also a very important PP-OCR paper.

4. Inference folder
There are 3 models under this folder, namely the direction classification model, the text detection model, and the text recognition model.

5. Inference_results folder
This folder is the directory generated after running the script, which stores the result annotation display pictures.

6. ppocr folder
This folder is the backbone of ppocr, the specific content is as follows:

data: data loading, data expansion.

losses: loss functions for classification, detection, and recognition models.

metrics:

modeling: model building, including: model structure, backbone, heads, necks, transforms.

optimizer: learning rate, learning strategy, optimizer, regularization.

postprocess: post-processing.

utils: utilities.

7. PPOCRLabel folder
Marking tool folder, and then take detailed notes when you use it.

8. StyleText folder
The style transfer folder is used to generate sample data and expand training samples.

9. The Tools folder
includes python scripts for training, reasoning, and evaluation. It can be called directly by writing a shell file in the root directory.

This section is transferred from: paddleocr tutorial

Specific use

For example, there is a picture as follows

  •  The whole process of detection, classification and identification
from paddleocr import PaddleOCR, draw_ocr
from PIL import Image


# Paddleocr目前支持中英文、英文、法语、德语、韩语、日语,可以通过修改lang参数进行切换
# 参数依次为`ch`, `en`, `french`, `german`, `korean`, `japan`。
ocr = PaddleOCR(use_angle_cls=True, lang="ch") # need to run only once to download and load model into memory
img_path = 'E:\\PaddleOCR-static\\doc\\imgs\\6.jpg'
result = ocr.ocr(img_path, cls=True)
for line in result:
    print(line)

# 显示结果
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in line]
print(boxes)
txts = [line[1][0] for line in line]
print(txts)
scores = [line[1][1] for line in line]
print(scores)
im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/simfang.ttf')
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
  • The structure of line is a list, each item contains text box, text and recognition confidence
[[[[614.0, 51.0], [753.0, 51.0], [753.0, 159.0], [614.0, 159.0]], ('38', 0.9994012117385864)], [[[640.0, 163.0], [729.0, 163.0], [729.0, 204.0], [640.0, 204.0]], ('包邮', 0.8606153130531311)], [[[349.0, 415.0], [433.0, 415.0], [433.0, 448.0], [349.0, 448.0]], ('OlAY', 0.7505984306335449)], [[[339.0, 449.0], [445.0, 449.0], [445.0, 484.0], [339.0, 484.0]], ('玉兰油', 0.9756715297698975)], [[[325.0, 486.0], [462.0, 486.0], [462.0, 503.0], [325.0, 503.0]], ('NaturalWhite', 0.927595317363739)], [[[341.0, 505.0], [446.0, 504.0], [446.0, 522.0], [341.0, 523.0]], ('白里透红系列', 0.9458746314048767)], [[[289.0, 526.0], [495.0, 523.0], [495.0, 547.0], [289.0, 550.0]], ('日间润白 SPF24/PA++', 0.9534435868263245)], [[[329.0, 554.0], [454.0, 553.0], [454.0, 571.0], [329.0, 572.0]], ('水养防晒美白霜', 0.9256204962730408)], [[[11.0, 710.0], [315.0, 715.0], [314.0, 781.0], [10.0, 776.0]], ('专柜正品', 0.9986205697059631)], [[[437.0, 716.0], [747.0, 716.0], [747.0, 785.0], [437.0, 785.0]], ('假一赔十', 0.9869048595428467)]]
  • boxes are text boxes, i.e. rectangles of text
[[[614.0, 51.0], [753.0, 51.0], [753.0, 159.0], [614.0, 159.0]], [[640.0, 163.0], [729.0, 163.0], [729.0, 204.0], [640.0, 204.0]], [[349.0, 415.0], [433.0, 415.0], [433.0, 448.0], [349.0, 448.0]], [[339.0, 449.0], [445.0, 449.0], [445.0, 484.0], [339.0, 484.0]], [[325.0, 486.0], [462.0, 486.0], [462.0, 503.0], [325.0, 503.0]], [[341.0, 505.0], [446.0, 504.0], [446.0, 522.0], [341.0, 523.0]], [[289.0, 526.0], [495.0, 523.0], [495.0, 547.0], [289.0, 550.0]], [[329.0, 554.0], [454.0, 553.0], [454.0, 571.0], [329.0, 572.0]], [[11.0, 710.0], [315.0, 715.0], [314.0, 781.0], [10.0, 776.0]], [[437.0, 716.0], [747.0, 716.0], [747.0, 785.0], [437.0, 785.0]]]
  • txts is the identified information
['38', '包邮', 'OlAY', '玉兰油', 'NaturalWhite', '白里透红系列', '日间润白 SPF24/PA++', '水养防晒美白霜', '专柜正品', '假一赔十']
  • Scores is the recognition confidence
[0.9994012117385864, 0.8606153130531311, 0.7505984306335449, 0.9756715297698975, 0.927595317363739, 0.9458746314048767, 0.9534435868263245, 0.9256204962730408, 0.9986205697059631, 0.9869048595428467]
  • Visualization of the results

Guess you like

Origin blog.csdn.net/qq_38312411/article/details/127567810