安装 pillow,pytesseract ,安装该模块之后,还需要安装 tesseract-ocr 。
(PS:如果安装了pip,可以python的scripts文件下,输入cmd,然后输入pip install pillow安装最新版的pillow,如果需要安装其它版本的则要自己下载安装,安装其它第三方库都可用这种方法。)
tesseract-ocr 下载地址: https://digi.bib.uni-mannheim.de/tesseract/
本次测试下载的是 tesseract-ocr-setup-4.00.00dev.exe ,这块的过程遇到好几个问题。
FileNotFoundError: [WinError 2] 系统找不到指定的文件。
pytesseract.pytesseract.TesseractError: (2, ‘Usage: python pytesseract.py [-l lang] input_file’)
pytesseract.pytesseract.TesseractError: (1, ‘Error opening data file \Program Files (x86)\Tesseract-OCR\eng.traineddata’)
这几个问题主要是需要安装配置Tesseract-OCR,
-
下载安装tesseract-ocr,
-
添加环境变量: TESSDATA_PREFIX = C:\Program Files (x86)\Tesseract-OCR (PS:在环境变量中新添加变量:TESSDATA_PREFIX ,值(路径)为:C:\Program Files (x86)\Tesseract-OCR(安装路径))
-
编辑文件 D:\Python35\Lib\site-packages\pytesseract\pytesseract.py
tesseract_cmd = ‘tesseract’
改为:
tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract’
# coding=utf-8
import requests
import pytesseract
from PIL import Image
from io import BytesIO
# captcha_url = 'https://www.'
# captcha_content = requests.get(url=captcha_url)
# captcha_content = captcha_content.content
# # 用自字节读出图片
# image = Image.open(BytesIO(captcha_content))
img_path = r'1351_5243.png'
image = Image.open(img_path)
# 转化为灰度图
imgry = image.convert('L')
table = [0 if i < 140 else 1 for i in range(256)]
# 使字体更加突出的显示
out = imgry.point(table,'1')
# out.show()
captcha = pytesseract.image_to_string(out)
captcha = captcha.strip()
captcha = captcha.upper()
print(captcha)