python+pillow+pytesseract+Tesseract-OCR验证码识别[转]

安装 pillow,pytesseract ,安装该模块之后,还需要安装 tesseract-ocr 。

(PS:如果安装了pip,可以python的scripts文件下,输入cmd,然后输入pip install pillow安装最新版的pillow,如果需要安装其它版本的则要自己下载安装,安装其它第三方库都可用这种方法。)

tesseract-ocr 下载地址: https://digi.bib.uni-mannheim.de/tesseract/

本次测试下载的是 tesseract-ocr-setup-4.00.00dev.exe ,这块的过程遇到好几个问题。

FileNotFoundError: [WinError 2] 系统找不到指定的文件。

pytesseract.pytesseract.TesseractError: (2, ‘Usage: python pytesseract.py [-l lang] input_file’)

pytesseract.pytesseract.TesseractError: (1, ‘Error opening data file \Program Files (x86)\Tesseract-OCR\eng.traineddata’)

这几个问题主要是需要安装配置Tesseract-OCR,

  1. 下载安装tesseract-ocr,

  2. 添加环境变量: TESSDATA_PREFIX = C:\Program Files (x86)\Tesseract-OCR (PS:在环境变量中新添加变量:TESSDATA_PREFIX ,值(路径)为:C:\Program Files (x86)\Tesseract-OCR(安装路径))

  3. 编辑文件 D:\Python35\Lib\site-packages\pytesseract\pytesseract.py

tesseract_cmd = ‘tesseract’
改为:
tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract’

https://blog.csdn.net/qq_33472658/article/details/78760135

# coding=utf-8
import requests
import pytesseract
from PIL import Image
from io import BytesIO


# captcha_url = 'https://www.'
# captcha_content = requests.get(url=captcha_url)
# captcha_content = captcha_content.content
# # 用自字节读出图片
# image = Image.open(BytesIO(captcha_content))

img_path = r'1351_5243.png'
image = Image.open(img_path)
# 转化为灰度图
imgry = image.convert('L')
table = [0 if i < 140 else 1 for i in range(256)]
# 使字体更加突出的显示
out = imgry.point(table,'1')
# out.show()
captcha = pytesseract.image_to_string(out)
captcha = captcha.strip()
captcha = captcha.upper()
print(captcha)

猜你喜欢

转载自blog.csdn.net/weixin_42486685/article/details/84570779