python-use tesseract to get the number in the verification code

 

Use tesseract to get the number in the verification code:

1. Install PIL-fork-1.1.7.win-amd64-py2.7

2. Install Pillow-4.3.0.win-amd64-py2.7

3.pip install pyocr

pip install pytesseract

4. Install the ocr tool: tesseract-ocr-setup-4.00.00dev.exe

http://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-setup-4.00.00dev.exe

Language pack:

Simplified character recognition package: https://raw.githubusercontent.com/tesseract-ocr/tessdata/4.00/chi_sim.traineddata

Traditional Chinese character recognition package: https://github.com/tesseract-ocr/tessdata/raw/4.0/chi_tra.traineddata

Recognition packages for other languages ​​https://github.com/tesseract-ocr/tesseract/wiki/Data-Files.

5. Configure tesseract-ocr environment variables

C:\Program Files (x86)\Tesseract-OCR

6. Added TESSDATA_PREFIX environment variable

Variable name: TESSDATA_PREFIX

值:C:\Program Files (x86)\Tesseract-OCR\tessdata

7. python code

# coding: utf-8

 

import pytesseract

 

import sys

import pyocr.builders

import pyocr

from PIL import Image

 

 

def image_to_str(vfile):

    tools = pyocr.get_available_tools()

    if len(tools) == 0:

        print("No OCR tool found")

        sys.exit(1)

    langs = tools[0].get_available_languages()

    txt = tools[0].image_to_string(

        Image.open(vfile),

        long = along [0],

        builder=pyocr.builders.TextBuilder()

    )

    print txt.replace(" ", "")

    return txt.replace(" ", "")

 

 

def image_to_str_by_pytesseract(vfile):

    image = Image.open(vfile)

    code = pytesseract.image_to_string(image)

    print code.replace(" ", "")

    return code.replace(" ", "")

 

 

if __name__ == '__main__':

    file1 = u'D:\WORK\python package&OCR\ verification code.jpg'

    file2 = u'D:\WORK\python package&OCR\ verification code 2.jpg'

    file3 = u'D:\WORK\python package&OCR\ verification code 3.jpg'

    image_to_str(file1)

    image_to_str_by_pytesseract(file2)

    image_to_str_by_pytesseract(file3)

 

8. Pay attention to the execution in pycharm, you need to add corresponding variables, otherwise an error will be reported in pycharm

 See attachment for pictures

9. Method 2 refers to the pytesseract error reporting solution:

Go to C:\Python27\Lib\site-packages\pytesseract

Open pytesseract.py

Revise:

try:

    import Image

except ImportError:

    from PIL import Image

Modify it to: from PIL import Image

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326487644&siteId=291194637