Java implementation OCR text recognition image (handwritten Chinese) ---- tess4j

Copyright: This article is a blogger search integration. https://blog.csdn.net/weixin_37794901/article/details/83343092

 

Recently, a need, a small program to generate handwritten Chinese Chinese after the end of the image on the back end needs to identify the pictures ..; think of the beginning of the third-party charges api try to use the generic font recognition AI Baidu open platform API, and later found Tessearct-OCR, refer to the integration of several abstracts moment

ready:

1. Download Tessearct-COR 3.0 or later: https://download.csdn.net/download/qq_26161693/10646074

2. Select chi_sim.traineddata language libraries at installation; after installation in the program to be loaded Chinese package directory tessdata (chi_sim.traineddata);

maven dependence:

        <dependency>
            <groupId>net.sourceforge.tess4j</groupId>
            <artifactId>tess4j</artifactId>
            <version>3.2.1</version>
        </dependency>

 

Demo:

    / **
     * 
     * @param srImage image path
     * @param ZH_CN whether Chinese training library, true- is
     * @return recognition result
     * /
    public static String discernWord (String imagePath) {
        the try {
            File = new new Image File (imagePath);
            textImage = ImageIO.read BufferedImage (Image);
            Tesseract instance = Tesseract.getInstance ();
            instance.setDatapath ( "C: \\ Program Files (x86) \\ \\ tessdata Tesseract-OCR"); // set the language database
            instance .setLanguage ( "chi_sim"); // Chinese identification
            String = null words;
            words = instance.doOCR (textImage);
            return words;
        }
        catch (Exception e) {
            e.printStackTrace();
        }
    }

Test:

 static void main public (String [] args) throws Exception {
        String = discernWord words ( "F.: /test_used_url/ocr/originalPic/hotkidclub.jpg", to true); // file path for an identification of FIG
        System.out.println ( words);
    }

ps:

In the development environment window to install the tesseract pro-test feasible, but have not tried to load not only install exe love language pack; conditions

Then there will be all sorts of pit run under the Linux environment to deploy

Solution: 1) after linux install Tesseract-OCR, copy the .so related files to / usr / lib directory

         2) in the root directory of the project (maven, then is the next resources) Add: linux-86-64 folder

         3) configure Linux locale variables

         4) If a large amount of visits tomcat also easily collapse out, the need to set the number of threads or concurrency;

 

Details Reference: http://www.cnblogs.com/zlAurora/p/9266039.html ;

Guess you like

Origin blog.csdn.net/weixin_37794901/article/details/83343092