tesseract be using python picture identification

1 from PIL import Image
2 import pytesseract
3 
4 text = pytesseract.image_to_string(Image.open(r'E:\guo\2432.jpg'),lang='chi_sim')
5 print(text)

My python 3.7

1. The need to install two modules,

PIL directly pip install PIL will complain to install using pip install Pillow PIL

The second direct pip install pytesseract

2 installed the module also need to download tesseract-ocr

Download URL: https: //github.com/UB-Mannheim/tesseract/wiki

Choose your own version download, can be installed directly after download. Note To remember the location of the installation, you need to use inferior

Modify the path to the file inside pytesseract.py

Open can enter PyCharm Import pytesseract.pytesseract then hold down the ctrl-button mouse against pytesseract Right-click on the go

 

1 from io import BytesIO
2 pandas_installed = find_loader('pandas') is not None
3 if pandas_installed:
4     import pandas as pd
5 
6 # CHANGE THIS IF TESSERACT IS NOT IN YOUR PATH, OR IS NAMED DIFFERENTLY
7 tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
8 RGB_MODE = 'RGB'

 

Tesseract_cmd which I have modified the address, tesseract.exe in position inside just installed, the settings here a good run error will not

 

 

 

pytesseract There are many language library, the default English, if you need to go to download the corresponding Chinese language pack: 
URL: https: //github.com/tesseract-ocr/tessdata
which chi_sim.traineddata Simplified Chinese language pack, the language pack can be placed into the directory tessdata installation path.
If you need to use the language lang = packages corresponding to the specified language pack. The default is English.

chi_sim.traineddata recognition rate is not high, if you need targeted text can be generated using a training model for their own language pack

 

Guess you like

Origin www.cnblogs.com/dayouzi/p/11295212.html