Open source Tesseract OCR software to download and Getting Started

Download

tesseract-ocr-setup-3.05.01.exe
Note: the installation of the selected language to be identified

tesseract-4.0-with-LSTM#400-alpha-for-windows

Windows runs tesseract

1. tesseractis a command-line OCR program, a terminal opening (key combination Win + R), Input:

tesseract imagename outputbase [-l lang] [-psm pagesegmode] [configfile...]

imagename
The name of the input image. Most image file formats (anything readable by 
Leptonica) are supported.(输入图片的文件名,可以是Leptonica支持的各种图片格式)

outputbase
The basename of the output file (to which the appropriate extension will be 
appended). By default the output will be named outbase.txt.(输出文件的文件名,
默认输出文件是txt格式的,还可以指定Hocr和pdf格式)

2. For chestnut: identifying myscan.pngimage, the recognition result is stored out.txt, the command line:

tesseract myscan.png out

3. Specify a particular language, the specific recognition Simplified Chinese and English -l chi_sim+eng, the command line:

tesseract myscan.png out -l chi_sim+eng

4. Specify the output file format, the file format of the command line coupled to, and can specify Hocr pdf format, HOCR is
an HTML file, a number of terms will be described after the parameters for each of its recognition. tesseract 3.03 and above versions only
support pdf format, command line:

hOCR: tesseract myscan.png out hocr
pdf: tesseract myscan.png out pdf

Github References

Run tesseract
Parameter Description

This switched: https://blog.csdn.net/cylj102908/article/details/78760777

Published 117 original articles · won praise 4 · views 80000 +

Guess you like

Origin blog.csdn.net/qq_36266449/article/details/81664587