python extract text from the picture

I environment: Arch Linux

1. Installation teseractand English, Chinese language pack

arch under incredibly simple to install, pacman will automatically help us resolve all the dependencies

sudo pacman -S tesseract tesseract-data-eng tesseract-data-chi_sim

2. python install the necessary third-party libraries

sudo pip install pillow
sudo pip install pytesseract

2. The code shows

Respectively recognition Chinese, English, numbers

Img directory under test when I recognized the picture in the same directory as the code of

import os
import pytesseract
from PIL import Image

BASE_DIR = os.path.dirname(__file__)

zh_img = os.path.join(BASE_DIR, "img/zh_demo.png")
en_img = os.path.join(BASE_DIR, "img/en_demo.png")
num_img = os.path.join(BASE_DIR, "img/num_demo.png")

zh = pytesseract.image_to_string(Image.open(zh_img), lang="chi_sim").replace(" ","")    # 中文识别有时不是特别准确,识别结果中间有空格
en = pytesseract.image_to_string(Image.open(en_img))    # 也只有识别规矩的英文和数字了,可以用来破解低级验证码
num = pytesseract.image_to_string(Image.open(num_img))

print(zh)   # 山重水覆疑无路,柳暗花明又一村
print(en)   # kainhuck
print(num)  # 0771-5785703

Guess you like

Origin www.cnblogs.com/kainhuck/p/12482993.html