1. Install pytesseract and PIL
1. pip command installation
pip install PIL
pip install pytesseract
2. Use the pycharm editor to install, follow the steps below.
Successful installation:
3. Try to run, an error occurs, as shown below, the reason: the recognition engine tesseract-ocr is not installed
2. Install the recognition engine tesseract-ocr
1. Install a Tesseract-OCR software. This software is an open source OCR software maintained by Google.
Download link: https://pan.baidu.com/s/1J0HNoVhX8WexS_5r0k2jDw Password: ywc3
Because tesseract-ocr does not support Chinese recognition by default.
Put the downloaded file: chi_sim.traineddata in the Tesseract-OCR installation directory D:\Program Files (x86)\Tesseract-OCR\tessdata, as shown in the figure:
2. After installing tesseract-ocr, you need to configure it.
Modify the pytesseract.py file in the Python installation directory (eg: D:\Python35\Lib\site-packages\pytesseract) .
You can also quickly open the pytesseract source file through pycharm, Ctrl+B:
3.尝试运行,出现如下报错:pytesseract.pytesseract.TesseractError: (1, 'Error opening data file \\Program Files (x86)\\Tesseract-OCR\\chi_sim.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language \'chi_sim\' Tesseract couldn\'t load any languages! Could not initialize tesseract.')
4. Solution: Add the path of the parent directory of the tessdata directory: (the default is the tesseract-ocr installation directory) to the TESSDATA_PREFIX environment variable, as shown below:
Note: After configuring the environment variables, you need to reopen the pycharm editor (IDE).
5. Test result: Image recognition succeeded!
The recognition rate is not very high.