python3 use Pillow, tesseract-ocr image recognition module and pytesseract method

This article describes the python3 use Pillow, picture identification of tesseract-ocr and pytesseract module methods, the paper sample code described in great detail, has a certain reference value of learning for all of us to learn or work, we need friends below with the small series together to learn from it
1. install Pillow

pip install Pillow

2. Installation tesseract-ocr

github Address: https://github.com/tesseract-ocr/tesseract

Or local download address: https: //www.jb51.net/softs/538925.html

windows:

The latest installer can be downloaded here: tesseract-ocr-setup-3.05.01.exe and tesseract-ocr-setup-4.00.00dev.exe (experimental).

ubuntu:

sudo apt-get install tesseract-ocr
traineddata文件路径: /usr/share/tesseract-ocr/tessdata/

3. Install pytesseract

pip install pytesseract

As the installation can not be used directly pip file search module preferably mounted directly

Problems encountered and solutions:

1.FileNotFoundError: [WinError 2] system can not find the file specified

Solution:

Method 1 [Recommended]: Add tesseract.exe to the PATH environment variable,

For example: D: \ Tesseract-OCR, the default path is C: \ Program Files (x86) \ Tesseract-OCR

Note: In order to make the environment variables to take effect and needs to close cmd window or close pycharm and other ide restart

Method 2: Modify pytesseract.py file, specifies the installation path tesseract.exe

# CHANGE THIS IF TESSERACT IS NOT IN YOUR PATH, OR IS NAMED DIFFERENTLY
tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe‘

Method 3: In actual operation code specifies

pytesseract.pytesseract.tesseract_cmd = 'D:\\Tesseract-OCR\\tesseract.exe'

2.pytesseract.pytesseract.TesseractError: (1, ‘Error opening data file \Tesseract-OCR\tessdata/eng.traineddata’)

Solution:

Method 1 [Recommended]:

The parent directory tessdata directory path where (the default installation directory for the tesseract-ocr) was added to the environment variable TESSDATA_PREFIX

For example: C: \ Program Files (x86) \ Tesseract-OCR

Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your “tessdata” directory.

Method 2: tessdata-dir specified in the configuration file .py

tessdata_dir_config = '--tessdata-dir "D:\\Tesseract-OCR\\tessdata"'
# tessdata_dir_config = '--tessdata-dir "'C:\\Program Files (x86)\\Tesseract-OCR\\tessdata"'
pytesseract.image_to_string(image, config=tessdata_dir_config)

Example:

# -*-coding:utf-8-*- 
from PIL import Image 
import sys 
import os 
import pytesseract
from selenium import webdriver 
sys.path.append('C:\Python27\Lib\site-packages\pytesser') 
import pytesser 
url='http://192.168.24.189/system/code?0.6824490785056669'
driver = webdriver.Firefox() 
driver.maximize_window() #将浏览器最大化 
driver.get(url) 
imgelement = driver.find_element_by_id('codeImg') #定位验证码 
location = imgelement.location #获取验证码x,y轴坐标 
size=imgelement.size #获取验证码的长宽 
rangle=(int(location['x']),int(location['y']),int(location['x']+size['width']),int(location['y']+size['height'])) #写成我们需要截取的位置坐标 
name="code.jpg" 
driver.find_element_by_id("codeImg").click() 
driver.save_screenshot(name) #截取当前网页,该网页有我们需要的验证码 
aa=Image.open(name) #打开截图 
frame4=aa.crop(rangle) #使用Image的crop函数,从截图中再次截取我们需要的区域 
frame4.save(name) 
im = Image.open(name)
#转化到灰度图
imgry = im.convert('L')
#保存图像
imgry.save('g'+name)
#二值化,采用阈值分割法,threshold为分割点
threshold = 140
table = []
for j in range(256):
  if j < threshold:
    table.append(0)
  else:
    table.append(1)
out = imgry.point(table, '1')
out.save('b'+name)
#识别
text = pytesseract.image_to_string(out)
#识别对吗
text = text.strip()
text = text.upper();
print (text)
text = pytesseract.image_to_string(Image.open('code.png'), lang="eng")
print(text) 

Finally, we recommend a very wide python learning resource gathering, [click to enter] , here are my collection before learning experience, study notes, there is a chance of business experience, and calmed down to zero on the basis of information to project combat , we can at the bottom, leave a message, do not know to put forward, we will study together progress

Published 43 original articles · won praise 30 · views 60000 +

Guess you like

Origin blog.csdn.net/haoxun09/article/details/104806661