java open source library Tesseract achieve picture identification

Tesseract-OCR support Chinese recognition, and open source and provides a full range of training tools, rapid development of low-cost choice.

Tess4J is used in the Java PC Tesseract

Tesseract OCR engine first introduced in 1985 was developed by HP Labs, has become one of the industry's most accurate OCR recognition engines pm to three in 1995. However, HP soon decided to abandon the OCR business, Tesseract also from dust.

Years later, HP realized that its Tesseract will be shelved, as contributions to the open source software industry, let it re-Hwan new life - in 2005, Tesseract obtained by the Nevada Institute of Information Technology, and to resort to Google to improve Tesseract eliminate Bug, optimization.

Tesseract is currently released as an open source project Google Project, its project home page here .

   <!-- https://mvnrepository.com/artifact/net.sourceforge.tess4j/tess4j -->
        <dependency>
            <groupId>net.sourceforge.tess4j</groupId>
            <artifactId>tess4j</artifactId>
            <version>3.4.0</version>
        </dependency>

Code development:

  ImageFile = File new new File ( "the INPUT dir / shuzi.png" ); 
        Tesseract tessreact = new new Tesseract ();
         // need to specify the training set training set to https://github.com/tesseract-ocr/tessdata download. 
        tessreact.setDatapath ( "E: \\ itcast tess4j \\ \\ \\ env tessdata" );
         // Note that the default is to identify English, Chinese identification if you do need to be set separately. 
        tessreact.setLanguage ( "chi_sim" );
         the try { 
            String Result = tessreact.doOCR (imageFile); 
            System.out.println (Result); 
        } the catch (TesseractException E) { 
            System.err.println (e.getMessage ());
        }

 

Guess you like

Origin www.cnblogs.com/alexzhang92/p/11488679.html