【purpose】
The purpose of this article by Tesseract-OCR character recognition picture data
A, Tesseract-OCR installation
1, Tesseract-OCR Windows installation package download: https://digi.bib.uni-mannheim.de/tesseract/
Install any file path, the installation is successful, remember the installation path, such as: D: \ tool \ Tesseract-OCR
2, environment settings
(1) modify environment variables, in this [computer] right click [Properties], [find] the environment variables, user variables and system variable Path entry points were added to open the path: D: \ tool \ Tesseract-OCR
System variables create a new variable name: TESSDATA_PREFIX, set the value: D: \ tool \ Tesseract-OCR \ tessdata
(2) modify the file pytesseract.py the python, the file path: C: Python \ Python37-32 \ Lib \ site-packages \ pytesseract
The pytesseract.py file:
tesseract_cmd = 'tesseract’
change into:
tesseract_cmd = 'D:\\tool\\Tesseract-OCR\\tesseract.exe'
3, verify that the installation was successful, run the code, no error indicates that the installation was successful
from PIL import Image
import pytesseract
file_path="c:/xxx.png"
image=Image.open(file_path)
result = pytesseract.image_to_string(image)
print result