python+pillow+pytesseract+Tesseract-OCR验证码识别[转] - 代码天地

python+pillow+pytesseract+Tesseract-OCR验证码识别[转]

其他 2018-12-10 14:20:14 阅读次数: 0

安装 pillow，pytesseract ，安装该模块之后，还需要安装 tesseract-ocr 。

（PS：如果安装了pip，可以python的scripts文件下，输入cmd,然后输入pip install pillow安装最新版的pillow,如果需要安装其它版本的则要自己下载安装，安装其它第三方库都可用这种方法。）

tesseract-ocr 下载地址： https://digi.bib.uni-mannheim.de/tesseract/

本次测试下载的是 tesseract-ocr-setup-4.00.00dev.exe ，这块的过程遇到好几个问题。

FileNotFoundError: [WinError 2] 系统找不到指定的文件。

pytesseract.pytesseract.TesseractError: (2, ‘Usage: python pytesseract.py [-l lang] input_file’)

pytesseract.pytesseract.TesseractError: (1, ‘Error opening data file \Program Files (x86)\Tesseract-OCR\eng.traineddata’)

这几个问题主要是需要安装配置Tesseract-OCR，

下载安装tesseract-ocr，
添加环境变量： TESSDATA_PREFIX = C:\Program Files (x86)\Tesseract-OCR （PS：在环境变量中新添加变量：TESSDATA_PREFIX ，值（路径）为：C:\Program Files (x86)\Tesseract-OCR（安装路径））
编辑文件 D:\Python35\Lib\site-packages\pytesseract\pytesseract.py

tesseract_cmd = ‘tesseract’
改为：
tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract’

https://blog.csdn.net/qq_33472658/article/details/78760135

# coding=utf-8
import requests
import pytesseract
from PIL import Image
from io import BytesIO


# captcha_url = 'https://www.'
# captcha_content = requests.get(url=captcha_url)
# captcha_content = captcha_content.content
# # 用自字节读出图片
# image = Image.open(BytesIO(captcha_content))

img_path = r'1351_5243.png'
image = Image.open(img_path)
# 转化为灰度图
imgry = image.convert('L')
table = [0 if i < 140 else 1 for i in range(256)]
# 使字体更加突出的显示
out = imgry.point(table,'1')
# out.show()
captcha = pytesseract.image_to_string(out)
captcha = captcha.strip()
captcha = captcha.upper()
print(captcha)

猜你喜欢

转载自blog.csdn.net/weixin_42486685/article/details/84570779

python+pillow+pytesseract+Tesseract-OCR验证码识别[转]

Python验证码识别安装Pillow、tesseract-ocr与pytesseract模块的安装以及错误解决

【验证码识别】Pillow、tesseract-ocr与pytesseract模块的安装以及错误解决

OpenCV---数字验证码识别 Python验证码识别安装Pillow、tesseract-ocr与pytesseract模块的安装以及错误解决

python 爬虫 pytesseract 验证码识别：认识Tesseract

Python - PIL-pytesseract-tesseract验证码识别

python使用tesseract-ocr完成验证码识别

验证码识别（转载）https://www.cnblogs.com/VseYoung/p/code.html Tesseract-OCR Tesseract-OCR pytesseract

python3使用Pillow、tesseract-ocr与pytesseract模块的图片识别的方法

tesseract-ocr 传统验证码识别

Tesseract-ocr视觉学习-验证码识别及使用

Tesseract-OCR识别图片验证码

nodeJS实现识别验证码（tesseract-ocr+GraphicsMagick）

验证码识别之Tesseract-OCR

使用Tesseract OCR识别简单的验证码

python 做验证码识别 tesseract

Mac python Tesseract 验证码识别

python使用tesseract识别验证码

Python爬虫教程-29-验证码识别-Tesseract-OCR

python下调用pytesseract识别某网站验证码

python -使用pytesseract识别验证码中遇到的问题

Python3 pytesseract识别简单的验证码

python opencv+pytesseract 验证码识别

python爬虫验证码识别模块tesseracr与pytesseract

Python使用pytesseract进行验证码图像识别

python 爬虫 pytesseract 验证码识别：识别拉勾网验证码

用pytesseract识别验证码报错

pytesseract模块验证码图片识别

使用pytesseract识别简单验证码

图形验证码文字识别——pytesseract

今日推荐

《美国对全球网络空间安全与发展的威胁和破坏》报告发布

火速冲上 GitHub 热榜 —— 开源编程语言、框架哪有这么可爱？

北京人形机器人创新中心发布全球首个纯电驱拟人奔跑的全尺寸人形机器人“天工”

LFOSSA 源来如此公开课 | 掌握云原生未来：CNCF 认证全面攻略与备考秘籍

国产云输入法——仅华为无云端数据上传安全问题

周排行

Python环境安装与基础语法（1）——计算机基础知识

IMU预积分

ADAS中的LDW、FCW、BSD、LCA、ACC、AEB、APA、DMS代表的含义

B站笔试两道题

skyeye arm 硬件虚拟机环境的搭建

Web前端静态页面示例

数组-合并排序数组 II-简单

springcloud之版本问题启动报错

面向对象-------------匿名对象(六)

输入URL到页面呈现中间发生了什么？

每日归档

更多

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)

2024-04-27(56)

2024-04-26(39)

2024-04-25(22)

2024-04-24(36)

2024-04-23(26)

2024-04-22(39)

2024-04-21(0)