1 Introduction
OCR, i.e., Optical Character Recognition, OCR, refers to the process by scanning the character, and then by its shape will be translated into electronic text. For graphics codes, they are some irregular characters, which is indeed slight twisting converted content obtained by the character.
2. Conditions
Tesseract must be installed before installing tesserocr
Related Links:
- tesserocr GitHub:https://github.com/sirfz/tesserocr
- tesserocr PyPI:https://pypi.python.org/pypi/tesserocr
- tesseract Download: http://digi.bib.uni-mannheim.de/tesseract
- tesseract GitHub:https://github.com/tesseract-ocr/tesseract
- tesseract language packs: https://github.com/tesseract-ocr/tessdata
- tesseract document: https://github.com/tesseract-ocr/tesseract/wiki/Documentation
Related learning information:
- Python tesserocr exemplary module
- win10 installed under tesserocr failure (the problem has been resolved, see end of text)
3. Install problems
View installation issues tesseract-ocr and tesserocr and the emergence of
print(pip._internal.pep425tags.get_supported())
[('cp37', 'cp37m', 'win32'), ('cp37', 'none', 'win32'), ('py3', 'none', 'win32'), ('cp37', 'none', 'any'), ('cp
3', 'none', 'any'), ('py37', 'none', 'any'), ('py3', 'none', 'any'), ('py36', 'none', 'any'), ('py35', 'none',
'any'), ('py34', 'none', 'any'), ('py33', 'none', 'any'), ('py32', 'none', 'any'), ('py31', 'none', 'any'), ('p
y30', 'none', 'any')]
It turned out to be py37, win32
tesserocr-2.2.2-cp36-cp36m-win32.whl, in a window or by being given pip3 install.
4. tesseract solve installation problems
- Official Recommended:
Installation
Windows
The proposed downloads consist of stand-alone packages containing all the Windows libraries needed for execution. This means that no additional installation of tesseract is required on your system.
#### pip
Download the wheel file corresponding to your Windows platform and Python installation from [simonflueckiger/tesserocr-windows_build/releases](https://github.com/simonflueckiger/tesserocr-windows_build/releases) and install them via:
pip install <package_name>.whl
这里的package_name, 是从(https://github.com/sirfz/tesserocr/releases)
下载的,对应自己的版本与环境。
Once downloaded, after installation, debugging code or error
File "tesserocr.pyx", line 2401, in tesserocr._tesserocr.image_to_textRuntimeError: Failed to init API, possibly an invalid tessdata path: C:\\
NO
reason:: While all the libraries required under Windows contains the stand-alone packages, but does not contain language data files (language data files). And unified data file needs to be placed in tessdata \ folder and placed in C: \ within Python36.
- Practice
without having to install tesseract, just clone tesseract warehouse main branch, and then one of thetessdata\
folder to thePython36\
middle. Next, tessdata_fast warehouse downloadeng.traineddata
language files, and placedtessdata\
inside the can.
- Code
from PIL import Image
import tesserocr
image = Image.open('./photo/image.jpg')
result = tesserocr.image_to_text(image)
print(result)
# 有些读取不出,需要二值化去杂
PS: If you think you can, okay, so-so, not even too bad, then, can "focus or point like" look, this thanked!
Reproduced in: https: //www.jianshu.com/p/e542e532a001