java pdf to word

Use java api to simply realize the function of converting pdf documents to word documents.
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.util.PDFTextStripper;

import java.io.*;

public class PdfToWord {

    public static void main(String[] args) {
        String pdfPath =  "D:\\Tools\\Spring.pdf";

        try {
            PDDocument doc = PDDocument.load(new File(pdfPath));
            int pageNum = doc.getNumberOfPages();
            pdfPath = pdfPath.substring(0, pdfPath.lastIndexOf("."));
            String fileName = pdfPath+".doc";
            File file = new File(fileName);
            if(!file.exists()){
                file.createNewFile();
            }
            FileOutputStream fos = new FileOutputStream(file);
            Writer writer = new OutputStreamWriter(fos,"UTF-8");
            PDFTextStripper textStripper = new PDFTextStripper();
            textStripper.setSortByPosition(true);
            textStripper.setStartPage(1);
            textStripper.setEndPage(pageNum);
            textStripper.writeText(doc,writer);
            writer.close();
            doc.close();

            System.out.println("转换成功");

        } catch (IOException e) {
            e.printStackTrace();
        }


    }

}

The jar package used in this article, pdfbox-1.8.2.jar, jdk 1.8, is built with gradle, compile("org.apache.pdfbox:pdfbox:1.8.2"), you can also download the jar package yourself.

Originally there was a pdf document with more than 90 pages, but there were no tags and it was troublesome to read. As a programmer, I used the program decisively to realize it. As a result, the converted file had no format and did not achieve the expected effect.

Guess you like

Origin blog.csdn.net/wzs535131/article/details/108911424