Apache PDFbox Quick Development Guide
Author: chszs, reprint should be noted. Blog homepage: http://blog.csdn.net/chszs
1. Introduction
Apache PDFbox is an open source, Java-based tool library that supports PDF document generation. It can be used to create new PDF documents, modify existing PDF documents, and extract desired content from PDF documents. Apache PDFBox also includes several command line tools.
Apache PDFbox released the latest version 1.8.2 not long ago.
2. Features
Apache PDFBox mainly has the following features:
1) Text extraction: Extract text from PDF documents.
2) Merge & Split: You can merge multiple PDF documents into a single one, or split a single PDF into multiple PDF documents.
3) Form filling: You can extract data from PDF forms, or fill PDF forms.
4) PDF/A Verification: Verifies whether the PDF document meets the PDF/A ISO standard.
5) PDF printing: output the PDF document to the printer - using Java's printing API.
6) PDF conversion: PDF documents can be converted into image files.
7) PDF Creation: A new PDF document can be created from scratch.
8) Integrated Lucene search engine: Lucene search engine is integrated with PDF indexing.
3. Development practice
Since Apache PDFbox is a PDF tool library, the most important example is to use it to create a PDF document. Here we begin the process.
1. Create a Java project
Create a Java project under Eclipse named PDFboxDemo.
2. Download PDFbox package
Address:
1) pdfbox-1.8.2.jar
Address: http://archive.apache.org/dist/pdfbox/1.8.2/pdfbox-1.8.2.jar
Description: Meets general PDF operation requirements.
2) pdfbox-app-1.8.2.jar
http://archive.apache.org/dist/pdfbox/1.8.2/pdfbox-app-1.8.2.jar
Description: PDFbox toolkit for multiple command lines.
3) fontbox-1.8.2.jar
address: http://archive.apache.org/dist/pdfbox/1.8.2/fontbox-1.8.2.jar
Description: The font package used by PDF
Therefore, this example uses 1. 3 items will do.
3. Create the class file
First create the chszs.pdf source package, and create the class file CreatePDF.java in this package.
- package chszs.pdf;
- // import java.io.File;
- import java.io.IOException;
- import org.apache.pdfbox.exceptions.COSVisitorException;
- import org.apache.pdfbox.pdmodel.PDDocument;
- import org.apache.pdfbox.pdmodel.PDPage;
- import org.apache.pdfbox.pdmodel.edit.PDPageContentStream;
- import org.apache.pdfbox.pdmodel.font.PDFont;
- //import org.apache.pdfbox.pdmodel.font.PDTrueTypeFont;
- import org.apache.pdfbox.pdmodel.font.PDType1Font;
- public class CreatePDF {
- public static void main(String[] args) throws IOException{
- PDDocument document = new PDDocument();
- PDPage page = new PDPage();
- document.addPage(page);
- // PDFont font = PDTrueTypeFont.loadTTF(document, new File("SIMSUN.TTC"));
- PDFont font = PDType1Font.HELVETICA_BOLD;
- PDPageContentStream contentStream = new PDPageContentStream(document, page);
- contentStream.beginText();
- contentStream.setFont(font, 14);
- contentStream.moveTextPositionByAmount(100, 700);
- contentStream.drawString("Hello World");
- // contentStream.drawString("中文");
- contentStream.endText();
- contentStream.close();
- try {
- document.save("E:/test.pdf");
- } catch (COSVisitorException e) {
- e.printStackTrace();
- }
- document.close();
- }
- }
执行程序,在磁盘E盘产生test.pdf文件。
总结说明:至Apache PDFbox 1.8.2版,仍然不支持中文PDF的创建,比iText的功能要弱很多。