OCR Tesseract installation


Tesseract is a commonly used open source OCR recognition engine. We will call this library for subsequent image text recognition projects. This article describes the installation and configuration of Tesseract.

1. Download Tesseract

Download address: Select the latest version of Tesseract to download. After the download is complete, unzip and install it in the installation path you set. Select next to complete the installation.
Tesseract

2. Add environment variables

Open the System Properties page, then click Advanced, and finally select Environment Variables.
system property

environment variable
On the environment variable page, add the Tesseract installation path to the Path of the user variable and system variable. To verify whether the addition of the environment variable is successful, open the cmd window and enter the command:

tesseract -v

cmd
If the version information of tesseract pops up, it means that the environment variable configuration is successful, otherwise, the configuration fails, and readers need to study the above steps carefully to reconfigure.
Use tesseract --list-langs to view Tesseract-OCR supported languages.
language

3. Configure the Tesseract Chinese recognition language package

Download path: chi_sim
Copy the downloaded chi_sim.traineddata file to the tessdata folder under the installation path, as shown in the figure:
chi_sim.traineddata

4. Download related libraries

pip install pytesseract
pip install Pillow

5. Example program

1. The picture to be recognized

sample image

2. Identification procedure

import pytesseract
from PIL import Image 
path="D:\\code\\python\\opencv\\图像处理\\test.png"
image=Image.open(path)
text=pytesseract.image_to_string(image,lang='chi_sim')
print(text)#打印输出识别文字

3. Recognition results

result graph

Guess you like

Origin blog.csdn.net/m0_53192838/article/details/127432761