Build your own OCR service, step 3: Install PPOCRLabel annotation tool

1. Installation instructions

After installing PaddleOCR, you also need to install the annotation tool PPOCRLabel. If you want to train the model yourself, having an annotation tool will be very helpful.

In particular, PPOCRLabel is an annotation tool for PaddleOCR, which is also open source.

In downloading the entire source code of PaddleOCR, there is the PPOCRLabel installer and source code.

I don’t remember that I can re-download the entire source code of PaddleOCR, the address is as follows:

git clone https://github.com/PaddlePaddle/PaddleOCR.git

2. Introduction to PPOCRLabel

PPOCRLabel is a semi-automatic graphic annotation tool suitable for the OCR field. It has a built-in PP-OCR model to automatically annotate and re-identify data.

Written in Python3 and PyQT5, it supports rectangular box annotation, table annotation, irregular text annotation, and key information annotation modes. The export format can be directly used for training PaddleOCR detection and recognition models.

3. Install PPOCRLabel

1. pip installation

pip install PPOCRLabel -i https://mirror.baidu.com/pypi/simple

2. Download the source code

git clone https://github.com/PaddlePaddle/PaddleOCR.git

Fourth, use PPOCRLabel

cd  ./git_workspace/PaddleOCR/PPOCRLabel

run start command

1. Select label mode to start  PPOCRLabel
PPOCRLabel --lang ch # Start [Normal Mode], used to print labels in the [Detection + Recognition] scenario
PPOCRLabel --lang ch --kie True # Start [KIE Mode], used to print [Detection + Recognition + Keyword Extraction] Tags of scenes

2. Run PPOCRLabel through Python script

Enter the PPOCRLabel source code directory

cd  ./git_workspace/PaddleOCR/PPOCRLabel

Run the startup command, --lang ch starts in Chinese mode, without parameters, the English interface and English recognition

python PPOCRLabel.py --lang ch

The first startup will automatically download and install the detection + recognition model.

 Show callout tool interface

  3. Select the image folder directory that needs to be annotated.

4. Mark

PPOCRLabel can automatically mark. After PPOCRLabel automatically marks, you can modify the wrongly marked text. Click on the recognized text box or add a "rectangular mark".

Modifications can be made in the corresponding recognition result area. After completing the annotation of the image, click the confirmation button in the lower right corner to save the annotation results.

 In the file list, the left side of the picture name is marked with ✔, and the green one is considered to be marked.

5. Export annotation results

The labeling results are saved in the sample directory with the file name Label.txt

Finally, the exported annotation result file can be directly used for training the PaddleOCR detection and recognition model.

Guess you like

Origin blog.csdn.net/xionghui2007/article/details/132753961