Wrote an image text recognition OCR tool in Python

introduction

Recently, in the technical exchange group, I talked about a need for image text recognition, which is often used in work and life, such as text extraction of bills, comics, scanned copies, and photos .

The blogger wrote a desktop OCR tool based on PyQt + labelme + PaddleOCR, which is used to quickly realize automatic detection of text + automatic text recognition .

The recognition effect is shown in the following figure:

 

All frame selection areas are automatically detected by the OCR algorithm, and the list on the right has the text content corresponding to each frame; click the text record in the “Recognition Result” on the right, and then click “Copy to Clipboard” to copy the text content.

function list

  • Text area detection + text recognition
  • text area visualization
  • text content list
  • Image, folder loading
  • Image wheel zoom view
  • Drawing area, editing area
  • Copy selected text recognition results

OCR section

Image text detection + text recognition algorithm, mainly by means of  paddleocr implementation.

Create or select a virtual environment and install the required third-party libraries.

conda create -n ocrconda activate ocr

mounting frame

If you don't have an NVIDIA GPU, or the GPU doesn't support CUDA, you can install the CPU version:

# CPU版本pip install paddlepaddle==2.1.0 -i https://mirror.baidu.com/pypi/simple 

If your GPU has installed CUDA9 or CUDA10, cuDNN 7.6+, you can choose the following GPU version:

# GPU版本python3 -m pip install paddlepaddle-gpu==2.1.0 -i https://mirror.baidu.com/pypi/simple

Install PaddleOCR

Install paddleocr:

pip install "paddleocr>=2.0.1" # 推荐使用2.0.1+版本

For layout analysis, Layout-Parser needs to be installed:

pip3 install -U https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl

Test if the installation was successful

After the installation is complete, test a picture --image_dir ./imgs/11.jpg, using Chinese and English detection + direction classifier + identification of the whole process:

paddleocr --image_dir ./imgs/11.jpg --use_angle_cls true --use_gpu false

Output a list:

call in python

from paddleocr import PaddleOCR, draw_ocr

# Paddleocr目前支持的多语言语种可以通过修改lang参数进行切换
# 例如`ch`, `en`, `fr`, `german`, `korean`, `japan`
ocr = PaddleOCR(use_angle_cls=True, lang="ch")  # need to run only once to download and load model into memory
img_path = './imgs/11.jpg'
result = ocr.ocr(img_path, cls=True)
for line in result:
    print(line)

The output is a list, each item contains the text box, text and recognition confidence:

[[[24.0, 36.0], [304.0, 34.0], [304.0, 72.0], [24.0, 74.0]], ['Pure Nutrient Conditioner', 0.964739]] [[[24.0, 80.0], [172.0 , 80.0], [172.0, 104.0], [24.0, 104.0]], ['Product Info/Parameters', 0.98069626]] [[[24.0, 109.0], [333.0, 109.0], [333.0, 136.0], [24.0 , 136.0]], ['(45 yuan/kg, 100kg minimum order)', 0.9676722]]  …

interface part

The interface part is implemented based on pyqt5. For the introduction of pyqt GUI program development and environment configuration, please refer to a blog for details (see the end of the article for details).

The main steps:

interface layout design

Drag and drop controls in QtDesigner to complete the program interface layout and save the *.uifile.

Use pyuic to automatically generate interface code

Find the file in the project file structure of *.uipycharm, right-click - External Tools - pyuic, and the Python code of the interface ui will be automatically generated in the same directory of the ui file.

Write interface business class

The business class MainWindow implements program logic and algorithm functions, and is decoupled from the ui implementation generated in the previous step 2 to avoid affecting the business code every time the ui file is modified. The controls on the ui interface can be self._ui.xxxObjectName accessed via .

class MainWindow(QMainWindow):
 FIT_WINDOW, FIT_WIDTH, MANUAL_ZOOM = 0, 1, 2

 def __init__(self):
  super().__init__()  # 调用父类构造函数,创建QWidget窗体
  self._ui = Ui_MainWindow()  # 创建ui对象
  self._ui.setupUi(self)  # 构造ui
  self.setWindowTitle(__appname__)

  # 加载默认配置
  config = get_config()
  self._config = config    
  
  # 单选按钮组
        self.checkBtnGroup = QButtonGroup(self)
        self.checkBtnGroup.addButton(self._ui.checkBox_ocr)
        self.checkBtnGroup.addButton(self._ui.checkBox_det)
        self.checkBtnGroup.addButton(self._ui.checkBox_recog)
        self.checkBtnGroup.addButton(self._ui.checkBox_layoutparser)
        self.checkBtnGroup.setExclusive(True)       

Implement interface business logic

Connect signals and slots to buttons, lists, and drawing controls on the main interface . The custom slot function does not need to be declared specially. If it is a custom signal, it needs to be added before the class __init__()  yourSignal= pyqtSignal(args).

Here we take the button response function and the list response function as examples. The signal of the button click is the signal of  clickedthe listWidget list switch selection  itemSelectionChanged .

# 按钮响应函数
self._ui.btnOpenImg.clicked.connect(self.openFile)
self._ui.btnOpenDir.clicked.connect(self.openDirDialog)
self._ui.btnNext.clicked.connect(self.openNextImg)
self._ui.btnPrev.clicked.connect(self.openPrevImg)
self._ui.btnStartProcess.clicked.connect(self.startProcess)
self._ui.btnCopyAll.clicked.connect(self.copyToClipboard)
self._ui.btnSaveAll.clicked.connect(self.saveToFile)
self._ui.listWidgetResults.itemSelectionChanged.connect(self.onItemResultClicked)

5. Run to see the effect

Run  python main.py to start the GUI program.

Open the picture → select the language model ch (Chinese) → select text detection + recognition → click Start, the detected text area will be automatically framed and displayed in the list of the recognition result on the right-text Tab page.

A list of all regions where text is detected, on the Recognition Results - Region Tab page:

At the end, I will also give you a python spree [Jiajun Yang: 419693945] to help you learn better!

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324341803&siteId=291194637