Generate pdf using iText to remove metadata

Foreword:

Itextpdf is basically used to generate PDF in Java, but using this third-party jar package will add information about the tool name and version number to the generated PDF, and there is currently no method to modify it, so when we need to clear the metadata When sending information, you can use the following methods:

The test code to generate pdf is as follows:

package org.example;

import com.itextpdf.text.*;
import com.itextpdf.text.pdf.*;
import com.sun.org.apache.bcel.internal.generic.NEW;
import sun.misc.Unsafe;

import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.lang.reflect.Field;
import java.util.ArrayList;
import java.util.Map;

public class Main {
    public static void main(String[] args) {
        String FILE_DIR = "./";
        //Step 1—Create a Document.
        Document document = new Document();
        try {
            //Step 2—Get a PdfWriter instance.
            PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(FILE_DIR + "createSamplePDF2.pdf"));
            //Step 3—Open the Document.
            document.open();
            //Step 4—Add content.
            document.add(new Paragraph("Hello World"));
            //Step 5—Close the Document.
            document.close();

        } catch (Exception ex) {
            System.out.print(ex);
        }
    }

}

Reflection modification (failed):

Option One:

I checked that the itextpdf version is generated mainly through the Version method of com.itextpdf.text, so I initially thought about modifying iText and release, but ignored the instantiation problem and failed:

Although it failed, a record was made to explain why it failed. The reflection code is as follows:

Class clazz = Class.forName("com.itextpdf.text.Version");
Object object = clazz.newInstance();
Field field = clazz.getDeclaredField("iText");
field.setAccessible(true);

System.out.println("field.get:" + field.get(object));
field.set(object,"aaaaaa");
System.out.println("field.get change:" + field.get(object));

String string =  Version.getInstance().getVersion();
System.out.print(string);

It can be seen that although the reflection modification is successful, when Version.getInstance().getVersion() is called, a new object will be instantiated, so the previous modification will be invalid, so this method fails.

Option II:

During debugging, I found that the pdf information obtained was stored in an array:

The storage location is the hashMap in the PdfDictionary class:

And I found that there is also a remove function, so that I can call remove to delete the array containing sensitive information:

The reflection code is as follows:

Class documentclass = document.getClass();
Field field = documentclass.getDeclaredField("listeners");
field.setAccessible(true);
ArrayList<DocListener> arrayList = (ArrayList<DocListener>) field.get(document);

for (int i = 0; i < arrayList.size(); i++) {
    PdfDocument pdfdocument = (PdfDocument) arrayList.get(i);
    Class pdfcumentclass = pdfdocument.getClass();
    field = pdfcumentclass.getDeclaredField("info");
    field.setAccessible(true);
    PdfDictionary pdfDictionary = (PdfDictionary) field.get(pdfdocument);
    PdfObject pdfObject = pdfDictionary.get(PdfName.PRODUCER);

    pdfDictionary.remove(PdfName.PRODUCER);
    System.out.print(field.get(pdfdocument));
}

After successful execution, you can see that the corresponding producer has been deleted:

 However, it was found that the document generated in the end still existed, and I found the problem after following the source code. 

Principle analysis:

The main execution code for finally generating pdf is located in the close method of the PdfDocument class. The intermediate code mainly generates different streams. The final sensitive information generated is located in the last writer.close method:

The most important code to enter the close method of PdfWriter is as follows. The first is to call Version.getInstance().getVersion() to generate the corresponding version information and store it in an array, and then call addToBody to write:

Then call the PdfDictionary method toPdf to write to the file: 

 After following the process, I found that there is no way to modify it through reflection, and there is no judgment that allows me to jump into other logic after modifying it through reflection, and it calls Version.getInstance().getVersion() to instantiate a new object. We have no The method is modified through reflection. Therefore, the method of modifying through reflection does not work here. You can only modify the jar package:

Modify jar package:

Source code download address:

https://github.com/itext/itextpdf

There are two ways to modify here. One is to directly decompile and then replace the package:

jar -xf itextpdf-5.5.12.jar
javac -cp Version.java
jar -cvfm0  itextpdf-5.5.12.jar META-INF/MANIFEST.MF  ./

Find Version.java in the source code and modify the code:

 Then compile and replace and repackage

Compile the source code:

The other step is to compile the source code. After the code download is completed, after modifying the source code, execute the following command:

mvn clean install -Dmaven.test.skip=true

After the execution is completed, there will be three compilation failures. This requires setting up an additional environment, but it is not necessary because our main compilation of iText Core has been successful. You can see the successfully compiled files directly in the folder. 

postscript:

After looking at the latest version of itextpdf, I found that the version has become a static variable. Then I can use the method of option 1 to make reflective modifications, without the trouble of compiling the source code.

Guess you like

Origin blog.csdn.net/GalaxySpaceX/article/details/132696500