OCR study notes (3) tesseract study
Introduction to tesseract
Tesseract is an open source text recognition project maintained by Google after the release of HP. Starting from Tesseract v4, it has announced support for deep neural network LSTM for text recognition.
tessercat installation under win10
(0) My python version is 3.6.5
(1) Download link : https://digi.bib.uni-mannheim.de/tesseract/
The version I choose is:
The version here needs to be installed later with tessorocr or pytesseract correspond.
Do not check the downloda content during installation, because downloading without a ladder will be slow or fail.
(2) You can download the language pack on GitHub: https://github.com/tesseract-ocr/tessdata
I chose the Chinese language pack and
then copy the downloaded files to the tessdata folder under the Tesseract-OCR directory , And copy the tessdate folder to the python installation directory.
(3) Add the environment variable
herein by reference blog, bloggers explain very clearly the environment variable reference blog
pytesseract or tesserocr installation
(1) teseerocr package, the installation process is: download tesserocr-2.2.2-cp36-cp36m-win_amd64.whl
on github and install it with cmd.
Code:
import tesserocr
from PIL import Image
image = Image.open(r'F:\download\blueman00-text-detection-ctpn-master\text-detection-ctpn\ctpn\data\demo\010.png')
image_vert=tesserocr.image_to_text(image)
print(image_vert)
The input is: the
output is:
(2) pytesseract installation
I installed directly in pycharm
Code:
import pytesseract
from PIL import Image
image = Image.open(r'F:\download\blueman00-text-detection-ctpn-master\text-detection-ctpn\ctpn\data\demo\010.png')
image_vert=pytesseract.image_to_string(image)
print(image_vert)