Under linux offline installation tesseract-ocr

Forgive: the linux install offline tesseract-ocr 

Identify the picture text letters engines, specific description can Baidu.

Because Linux is located within the company's network can not download and install the network, so it is necessary to install offline. Due to the need to compile the source code, so you first need to install the gcc compiler tools, gcc installation method your own Baidu. tesseract-ocr first need to download the following source packages (installed in accordance with the order of the index, if the compiler Times that the lack of a form, you can download and install):

1.autoconf-2.69.tar.gz

2.automake-1.15.tar.gz

3.libtool-2.4.2.tar.gz

4.leptonica-1.73.tar.gz

5.libpng-1.5.8.tar.gz

6.tesseract-ocr3.02.02.tar.gz

7.eng.traineddata.gz

Version of the above package can choose, but require more attention tesseract-ocr3.02.02 version 1.69 leptonica. The relationship between the version of other packages have not tried.

Once the packages listed above decompression: tar zxvf xxxx.tar.gz

Then enter the directory after decompression execution: ./ configure && make && make install to compile and install configuration. For eng.traineddata.gz language pack, after decompression you need to copy the files in tesseract-ocr / tessdata to / usr / local / share / tessdata in (this step is not verified if required).

Because here only the libpng dependencies installed, it can only resolve png images, files in other formats require additional download the installation package, the libjp and so on.

If all goes well, successfully installed the esseract-ocr, you can generate your own png image with a letter into the server, the cut esseract-ocr directory after execution

tesseract test.png test -l eng

If successful will generate a test.txt file, as is the recognized text contents of the letter.

If you reported the following error:

Tesseract Open Source OCR Engine v3.02.02 with Leptonica
Error in findTiffCompression: function not present
Error in pixReadStreamTiff: function not present
Error in pixReadStream: tiff: no pix returned
Error in pixRead: pix not read
Unsupported image type.

Check leptonica version supports version tesseract-ocr, and then check libpng-1.5.8.tar.gz package (which is to support the picture dependencies) are installed correctly, if properly installed still reported the same mistakes, will leptonica uninstall reinstall it again (author is so resolved).


Original: the linux install offline tesseract-ocr 


Installation libjpg, etc. If you can not install, can be used:

rpm -ivh --nodeps xxxx.rpm 

Guess you like

Origin blog.csdn.net/ptianfeng/article/details/72817968