Forgive: the linux install offline tesseract-ocr
Identify the picture text letters engines, specific description can Baidu.
Because Linux is located within the company's network can not download and install the network, so it is necessary to install offline. Due to the need to compile the source code, so you first need to install the gcc compiler tools, gcc installation method your own Baidu. tesseract-ocr first need to download the following source packages (installed in accordance with the order of the index, if the compiler Times that the lack of a form, you can download and install):
1.autoconf-2.69.tar.gz
2.automake-1.15.tar.gz
3.libtool-2.4.2.tar.gz
4.leptonica-1.73.tar.gz
5.libpng-1.5.8.tar.gz
6.tesseract-ocr3.02.02.tar.gz
7.eng.traineddata.gz
Version of the above package can choose, but require more attention tesseract-ocr3.02.02 version 1.69 leptonica. The relationship between the version of other packages have not tried.
Once the packages listed above decompression: tar zxvf xxxx.tar.gz
Then enter the directory after decompression execution: ./ configure && make && make install to compile and install configuration. For eng.traineddata.gz language pack, after decompression you need to copy the files in tesseract-ocr / tessdata to / usr / local / share / tessdata in (this step is not verified if required).
Because here only the libpng dependencies installed, it can only resolve png images, files in other formats require additional download the installation package, the libjp and so on.
If all goes well, successfully installed the esseract-ocr, you can generate your own png image with a letter into the server, the cut esseract-ocr directory after execution
tesseract test.png test -l eng
If successful will generate a test.txt file, as is the recognized text contents of the letter.
If you reported the following error:
Tesseract Open Source OCR Engine v3.02.02 with Leptonica Error in findTiffCompression: function not present Error in pixReadStreamTiff: function not present Error in pixReadStream: tiff: no pix returned Error in pixRead: pix not read Unsupported image type.
Check leptonica version supports version tesseract-ocr, and then check libpng-1.5.8.tar.gz package (which is to support the picture dependencies) are installed correctly, if properly installed still reported the same mistakes, will leptonica uninstall reinstall it again (author is so resolved).
Original: the linux install offline tesseract-ocr
Installation libjpg, etc. If you can not install, can be used:
rpm -ivh --nodeps xxxx.rpm