python-OCR Image recognition library --tesserocr

1 Introduction

OCR, i.e., Optical Character Recognition, OCR, refers to the process by scanning the character, and then by its shape will be translated into electronic text. For graphics codes, they are some irregular characters, which is indeed slight twisting converted content obtained by the character.

2. Conditions

Tesseract must be installed before installing tesserocr

Related Links:

Related learning information:

  1. Python tesserocr exemplary module
  2. win10 installed under tesserocr failure (the problem has been resolved, see end of text)

3. Install problems

View installation issues tesseract-ocr and tesserocr and the emergence of

print(pip._internal.pep425tags.get_supported())

[('cp37', 'cp37m', 'win32'), ('cp37', 'none', 'win32'), ('py3', 'none', 'win32'), ('cp37', 'none', 'any'), ('cp
3', 'none', 'any'), ('py37', 'none', 'any'), ('py3', 'none', 'any'), ('py36', 'none', 'any'), ('py35', 'none',
'any'), ('py34', 'none', 'any'), ('py33', 'none', 'any'), ('py32', 'none', 'any'), ('py31', 'none', 'any'), ('p
y30', 'none', 'any')]

It turned out to be py37, win32

tesserocr-2.2.2-cp36-cp36m-win32.whl, in a window or by being given pip3 install.

4. tesseract solve installation problems

  • Official Recommended:
Installation
Windows
The proposed downloads consist of stand-alone packages containing all the Windows libraries needed for execution. This means that no additional installation of tesseract is required on your system.
#### pip

Download the wheel file corresponding to your Windows platform and Python installation from [simonflueckiger/tesserocr-windows_build/releases](https://github.com/simonflueckiger/tesserocr-windows_build/releases) and install them via:
pip install <package_name>.whl

这里的package_name, 是从(https://github.com/sirfz/tesserocr/releases)
下载的,对应自己的版本与环境。

Once downloaded, after installation, debugging code or error

File "tesserocr.pyx", line 2401, in tesserocr._tesserocr.image_to_textRuntimeError: Failed to init API, possibly an invalid tessdata path: C:\\

NO
reason:: While all the libraries required under Windows contains the stand-alone packages, but does not contain language data files (language data files). And unified data file needs to be placed in tessdata \ folder and placed in C: \ within Python36.

  • Practice
    without having to install tesseract, just clone tesseract warehouse main branch, and then one of the tessdata\folder to the Python36\middle. Next, tessdata_fast warehouse download eng.traineddatalanguage files, and placed tessdata\inside the can.
2223200-fe5c804395605a36.png
image.png
  • Code
from PIL import Image
import tesserocr

image = Image.open('./photo/image.jpg')
result = tesserocr.image_to_text(image)
print(result)

# 有些读取不出,需要二值化去杂

PS: If you think you can, okay, so-so, not even too bad, then, can "focus or point like" look, this thanked!

Reproduced in: https: //www.jianshu.com/p/e542e532a001

Guess you like

Origin blog.csdn.net/weixin_34082789/article/details/91244014