Download
tesseract-ocr-setup-3.05.01.exe
Note: the installation of the selected language to be identified
tesseract-4.0-with-LSTM#400-alpha-for-windows
Windows runs tesseract
1. tesseract
is a command-line OCR program, a terminal opening (key combination Win + R), Input:
tesseract imagename outputbase [-l lang] [-psm pagesegmode] [configfile...]
imagename
The name of the input image. Most image file formats (anything readable by
Leptonica) are supported.(输入图片的文件名,可以是Leptonica支持的各种图片格式)
outputbase
The basename of the output file (to which the appropriate extension will be
appended). By default the output will be named outbase.txt.(输出文件的文件名,
默认输出文件是txt格式的,还可以指定Hocr和pdf格式)
2. For chestnut: identifying myscan.png
image, the recognition result is stored out.txt
, the command line:
tesseract myscan.png out
3. Specify a particular language, the specific recognition Simplified Chinese and English -l chi_sim+eng
, the command line:
tesseract myscan.png out -l chi_sim+eng
4. Specify the output file format, the file format of the command line coupled to, and can specify Hocr pdf format, HOCR is
an HTML file, a number of terms will be described after the parameters for each of its recognition. tesseract 3.03 and above versions only
support pdf format, command line:
hOCR: tesseract myscan.png out hocr
pdf: tesseract myscan.png out pdf
Github References
Run tesseract
Parameter Description
This switched: https://blog.csdn.net/cylj102908/article/details/78760777