Java read PDF text and pictures

This article will introduce to read the PDF document text and pictures through a Java program methods. Respectively, calling the method extractText () and extractImages () to read.

 

Use tools : Free Spire.PDF for Java (free version)

Jar file gets imported:

Method 1 : through the official website to download jar package. After the download, unzip the file and the lib folder under the Spire.Pdf.jar file into java program. After introducing the following figure:

 

Method 2 : by maven mounted introducing warehouse.

 

Java code examples

import com.spire.pdf.*;

import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;

public class ExtractText {
    public static void main(String[]args) throws Exception {
        //加载测试文档
        PdfDocument pdf = new PdfDocument("sample.pdf");

        //实例化StringBuilder类
        StringBuilder sb = new StringBuilder();
        //Define a variable of type int 
        int index = 0 ; 

        // iterate PDF document page 
        PdfPageBase Page;
         for ( int I = 0; I <pdf.getPages () getCount ();. I ++ ) { 
            Page = pdf.getPages () .get (I);
             // call extractText () method extracts the text 
            sb.append (page.extractText ( to true )); 
            FileWriter Writer; 
            the try {
                 // the written text in the object to StringBuilder TXT 
                Writer = new new FileWriter ( " ExtractText.txt " ); 
                writer.Write (sb.toString ());
                writer.flush (); 
            } the catch (IOException E) { 
                e.printStackTrace (); 
            } 

            // call extractImages image acquisition method 
            for (the BufferedImage Image: page.extractImages ()) {
                     // the specified name of the output image, image format specifies 
                    File = Output new new File (String.format ( "Image_% d.png", index ++ )); 
                    ImageIO.write (Image, "PNG" , Output); 
            } 
        } 
        pdf.close (); 
    } 
}

Read the text and images effects:

 

(This article End)

 

Guess you like

Origin www.cnblogs.com/Yesi/p/11206330.html