Use tesseract to get the number in the verification code:
1. Install PIL-fork-1.1.7.win-amd64-py2.7
2. Install Pillow-4.3.0.win-amd64-py2.7
3.pip install pyocr
pip install pytesseract
4. Install the ocr tool: tesseract-ocr-setup-4.00.00dev.exe
http://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-setup-4.00.00dev.exe
Language pack:
Simplified character recognition package: https://raw.githubusercontent.com/tesseract-ocr/tessdata/4.00/chi_sim.traineddata
Traditional Chinese character recognition package: https://github.com/tesseract-ocr/tessdata/raw/4.0/chi_tra.traineddata
Recognition packages for other languages https://github.com/tesseract-ocr/tesseract/wiki/Data-Files.
5. Configure tesseract-ocr environment variables
C:\Program Files (x86)\Tesseract-OCR
6. Added TESSDATA_PREFIX environment variable
Variable name: TESSDATA_PREFIX
值:C:\Program Files (x86)\Tesseract-OCR\tessdata
7. python code
# coding: utf-8
import pytesseract
import sys
import pyocr.builders
import pyocr
from PIL import Image
def image_to_str(vfile):
tools = pyocr.get_available_tools()
if len(tools) == 0:
print("No OCR tool found")
sys.exit(1)
langs = tools[0].get_available_languages()
txt = tools[0].image_to_string(
Image.open(vfile),
long = along [0],
builder=pyocr.builders.TextBuilder()
)
print txt.replace(" ", "")
return txt.replace(" ", "")
def image_to_str_by_pytesseract(vfile):
image = Image.open(vfile)
code = pytesseract.image_to_string(image)
print code.replace(" ", "")
return code.replace(" ", "")
if __name__ == '__main__':
file1 = u'D:\WORK\python package&OCR\ verification code.jpg'
file2 = u'D:\WORK\python package&OCR\ verification code 2.jpg'
file3 = u'D:\WORK\python package&OCR\ verification code 3.jpg'
image_to_str(file1)
image_to_str_by_pytesseract(file2)
image_to_str_by_pytesseract(file3)
8. Pay attention to the execution in pycharm, you need to add corresponding variables, otherwise an error will be reported in pycharm
See attachment for pictures
9. Method 2 refers to the pytesseract error reporting solution:
Go to C:\Python27\Lib\site-packages\pytesseract
Open pytesseract.py
Revise:
try:
import Image
except ImportError:
from PIL import Image
Modify it to: from PIL import Image