OCR study notes (3) tesseract study

OCR study notes (3) tesseract study

Introduction to tesseract

Tesseract is an open source text recognition project maintained by Google after the release of HP. Starting from Tesseract v4, it has announced support for deep neural network LSTM for text recognition.

tessercat installation under win10

(0) My python version is 3.6.5
(1) Download link : https://digi.bib.uni-mannheim.de/tesseract/
The version I choose is: Insert picture description here
The version here needs to be installed later with tessorocr or pytesseract correspond.
Do not check the downloda content during installation, because downloading without a ladder will be slow or fail.
(2) You can download the language pack on GitHub: https://github.com/tesseract-ocr/tessdata
I chose the Chinese language pack and
Insert picture description here
then copy the downloaded files to the tessdata folder under the Tesseract-OCR directory , And copy the tessdate folder to the python installation directory.
(3) Add the environment variable
herein by reference blog, bloggers explain very clearly the environment variable reference blog

pytesseract or tesserocr installation

(1) teseerocr package, the installation process is: download tesserocr-2.2.2-cp36-cp36m-win_amd64.whl
on github and install it with cmd.
Code:

import tesserocr
from PIL import Image
image = Image.open(r'F:\download\blueman00-text-detection-ctpn-master\text-detection-ctpn\ctpn\data\demo\010.png')
image_vert=tesserocr.image_to_text(image)
print(image_vert)

The input is: the Insert picture description here
output is:
Insert picture description here
(2) pytesseract installation
I installed directly in pycharm
Insert picture description here
Code:

import pytesseract
from PIL import Image
image = Image.open(r'F:\download\blueman00-text-detection-ctpn-master\text-detection-ctpn\ctpn\data\demo\010.png')
image_vert=pytesseract.image_to_string(image)
print(image_vert)

Guess you like

Origin blog.csdn.net/dbdxwyl/article/details/108330700