1. Installation instructions
After installing PaddleOCR, you also need to install the annotation tool PPOCRLabel. If you want to train the model yourself, having an annotation tool will be very helpful.
In particular, PPOCRLabel is an annotation tool for PaddleOCR, which is also open source.
In downloading the entire source code of PaddleOCR, there is the PPOCRLabel installer and source code.
I don’t remember that I can re-download the entire source code of PaddleOCR, the address is as follows:
git clone https://github.com/PaddlePaddle/PaddleOCR.git
2. Introduction to PPOCRLabel
PPOCRLabel is a semi-automatic graphic annotation tool suitable for the OCR field. It has a built-in PP-OCR model to automatically annotate and re-identify data.
Written in Python3 and PyQT5, it supports rectangular box annotation, table annotation, irregular text annotation, and key information annotation modes. The export format can be directly used for training PaddleOCR detection and recognition models.
3. Install PPOCRLabel
1. pip installation
pip install PPOCRLabel -i https://mirror.baidu.com/pypi/simple
2. Download the source code
git clone https://github.com/PaddlePaddle/PaddleOCR.git
Fourth, use PPOCRLabel
cd ./git_workspace/PaddleOCR/PPOCRLabel
run start command
1. Select label mode to start PPOCRLabel
PPOCRLabel --lang ch # Start [Normal Mode], used to print labels in the [Detection + Recognition] scenario
PPOCRLabel --lang ch --kie True # Start [KIE Mode], used to print [Detection + Recognition + Keyword Extraction] Tags of scenes
2. Run PPOCRLabel through Python script
Enter the PPOCRLabel source code directory
cd ./git_workspace/PaddleOCR/PPOCRLabel
Run the startup command, --lang ch starts in Chinese mode, without parameters, the English interface and English recognition
python PPOCRLabel.py --lang ch
The first startup will automatically download and install the detection + recognition model.
Show callout tool interface
3. Select the image folder directory that needs to be annotated.
4. Mark
PPOCRLabel can automatically mark. After PPOCRLabel automatically marks, you can modify the wrongly marked text. Click on the recognized text box or add a "rectangular mark".
Modifications can be made in the corresponding recognition result area. After completing the annotation of the image, click the confirmation button in the lower right corner to save the annotation results.
In the file list, the left side of the picture name is marked with ✔, and the green one is considered to be marked.
5. Export annotation results
The labeling results are saved in the sample directory with the file name Label.txt
Finally, the exported annotation result file can be directly used for training the PaddleOCR detection and recognition model.