Install tesseract-ocr 4.00 in Windows environment and configure environment variables

Recently, I want to do text recognition, and I don't allow other people's interfaces to be used directly, so I can only try to use open source class libraries. tesseract-ocr is an open source text recognition project of Hewlett-Packard Company. It can quickly build a graphic recognition system and help us develop an ocr system that can recognize pictures. Because of the Windows environment development, I also have to install the system in the Windows environment.

Step 1: Download the installation package

According to https://github.com/tesseract-ocr/tesseract/wiki, I find unofficial packages, it seems that I only see packages for 64 bit http://digi.bib.uni-mannheim.de/tesseract /tesseract-ocr-setup-4.00.00dev.exe, you can install it directly after downloading, but remember your installation directory, we will configure the environment variables to use later.

If you are not doing English image and text recognition, you also need to download recognition packages in other languages ​​https://github.com/tesseract-ocr/tesseract/wiki/Data-Files.

Simplified character recognition package: https://raw.githubusercontent.com/tesseract-ocr/tessdata/4.00/chi_sim.traineddata

Traditional Chinese character recognition package: https://github.com/tesseract-ocr/tessdata/raw/4.0/chi_tra.traineddata

 

Step 2: Installation

Directly execute the downloaded tesseract-ocr-setup-4.00.00dev.exe, the next step, the next step to install.

Step 3: Configure environment variables

Note: My system is win7, other systems should be similar, just like configuring java variables

Copy your installation address, mine is installed in C:\Program Files (x86)\Tesseract-OCR, the interface is as follows:

Copy the installation path "C:\Program Files (x86)\Tesseract-OCR", go to "Control Panel\System and Security\System", click

"System Protection"

Enter the following interface:

Click Environment Variables to enter the configuration interface as follows:

Add the installation path "C:\Program Files (x86)\Tesseract-OCR" just now to the red-lined PATH and Path. Note that when adding, use ";" at the beginning to separate the previous variables, and end with ";" . Below is a sample of my configuration information:

C:\Users\Administrator\AppData\Roaming\Composer\vendor\bin;C:\Users\Administrator\AppData\Roaming\npm;C:\Program Files (x86)\Tesseract-OCR;

After the configuration is complete, click Save.

 

Open the command terminal, enter: tesseract -v, you can see the version information

If an error occurs, it is estimated that the environment variables are not configured properly.

At this point, even if the installation is complete, our system still cannot recognize Chinese. We are going to download the simplified Chinese and traditional Chinese language packs (the addresses are given above), and after downloading, put them in the tessconfigs directory of the installation directory. down.

Supplement: Because there is no global variable configured, data conversion cannot be performed across disks. Here we add a configuration information to the environment variable.

System Variables-->New:

Add a TESSDATA_PREFIX variable name, the variable value is still my installation path C:\Program Files (x86)\Tesseract-OCR;

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326035968&siteId=291194637