Tesseract installation
Tesseract is a commonly used open source OCR recognition engine. We will call this library for subsequent image text recognition projects. This article describes the installation and configuration of Tesseract.
1. Download Tesseract
Download address: Select the latest version of Tesseract to download. After the download is complete, unzip and install it in the installation path you set. Select next to complete the installation.
2. Add environment variables
Open the System Properties page, then click Advanced, and finally select Environment Variables.
On the environment variable page, add the Tesseract installation path to the Path of the user variable and system variable. To verify whether the addition of the environment variable is successful, open the cmd window and enter the command:
tesseract -v
If the version information of tesseract pops up, it means that the environment variable configuration is successful, otherwise, the configuration fails, and readers need to study the above steps carefully to reconfigure.
Use tesseract --list-langs to view Tesseract-OCR supported languages.
3. Configure the Tesseract Chinese recognition language package
Download path: chi_sim
Copy the downloaded chi_sim.traineddata file to the tessdata folder under the installation path, as shown in the figure:
4. Download related libraries
pip install pytesseract
pip install Pillow
5. Example program
1. The picture to be recognized
2. Identification procedure
import pytesseract
from PIL import Image
path="D:\\code\\python\\opencv\\图像处理\\test.png"
image=Image.open(path)
text=pytesseract.image_to_string(image,lang='chi_sim')
print(text)#打印输出识别文字