Use java api to simply realize the function of converting pdf documents to word documents.
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.util.PDFTextStripper;
import java.io.*;
public class PdfToWord {
public static void main(String[] args) {
String pdfPath = "D:\\Tools\\Spring.pdf";
try {
PDDocument doc = PDDocument.load(new File(pdfPath));
int pageNum = doc.getNumberOfPages();
pdfPath = pdfPath.substring(0, pdfPath.lastIndexOf("."));
String fileName = pdfPath+".doc";
File file = new File(fileName);
if(!file.exists()){
file.createNewFile();
}
FileOutputStream fos = new FileOutputStream(file);
Writer writer = new OutputStreamWriter(fos,"UTF-8");
PDFTextStripper textStripper = new PDFTextStripper();
textStripper.setSortByPosition(true);
textStripper.setStartPage(1);
textStripper.setEndPage(pageNum);
textStripper.writeText(doc,writer);
writer.close();
doc.close();
System.out.println("转换成功");
} catch (IOException e) {
e.printStackTrace();
}
}
}
The jar package used in this article, pdfbox-1.8.2.jar, jdk 1.8, is built with gradle, compile("org.apache.pdfbox:pdfbox:1.8.2"), you can also download the jar package yourself.
Originally there was a pdf document with more than 90 pages, but there were no tags and it was troublesome to read. As a programmer, I used the program decisively to realize it. As a result, the converted file had no format and did not achieve the expected effect.