java.lang.OutOfMemoryError when validating pdf with pdfbox preflight 2.0.13

Dana Shaw :

PDFBOX-4450 Details on Issue

Not sure if anyone has encountered this issue, but am getting an outofmemory exception when validating pdf's. Posting here for visibility, if anyone could help that would be awesome.

If anyone has any ideas, please share. At this point I can't really move forward.

Stuff I've tried

  • Followed suggestions in wiki without success PDFBox faq

  • Increased max heap size from 2GB to 4GB

  • Removed jvm arg:-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider

  • Tried using jdk 1.7

  • Used a scratch file (from wiki)
  • Disabled the cache for PDImageXObject (from wiki)

My Environment

  • Linux 64 bit (arch linux)
  • Java 8
  • PDFBox/Preflight ver. 2.0.13
  • jbig imageio ver. 3.0.2

Java info

java -version

java version "1.8.0_131"

Java(TM) SE Runtime Environment (build 1.8.0_131-b11)

Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)

JVM Args used

java -Xmx2048m -Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider

Example pdf

Pdf from PDFBOX-4450

Console Output

Jan 30, 2019 10:25:58 AM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
WARNING: Using fallback font ArialMT for base font Symbol
Jan 30, 2019 10:25:58 AM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
WARNING: Using fallback font ArialMT for base font ZapfDingbats
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOfRange(Arrays.java:3664)
at java.lang.String.<init>(String.java:207)
at java.lang.StringBuilder.toString(StringBuilder.java:407)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1587)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1587)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1587)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1587)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1587)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.toString(COSDictionary.java:1531)
at org.apache.pdfbox.preflight.xobject.XObjFormValidator.checkGroup(XObjFormValidator.java:138)
at org.apache.pdfbox.preflight.xobject.XObjFormValidator.validate(XObjFormValidator.java:73)
at org.apache.pdfbox.preflight.process.reflect.GraphicObjectPageValidationProcess.validate(GraphicObjectPageValidationProcess.java:74)
at org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:84)
at org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:57)
at org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validateXObjects(ResourcesValidationProcess.java:224)
at org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validate(ResourcesValidationProcess.java:81)
at org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:84)

Sample code

import java.io.File;
import java.util.ArrayList;
import java.util.List;
import org.apache.pdfbox.preflight.PreflightDocument;
import org.apache.pdfbox.preflight.ValidationResult;
import org.apache.pdfbox.preflight.ValidationResult.ValidationError;
import org.apache.pdfbox.preflight.parser.PreflightParser;

public class Validator {
  private File file = null;
  private List<ValidationError> errorList = new ArrayList<ValidationError>();

  public Validator(File file) {
    this.file = file;
  }

  public List<ValidationError> getErrors(){
    return errorList;
  }

  public boolean validate() throws Exception{
    PreflightParser parser = null;
    PreflightDocument document = null;
    ValidationResult result = null;
    try {
      parser = new PreflightParser(file);
      parser.parse();
      document = parser.getPreflightDocument();
      document.validate();
      result = document.getResult();
      errorList = result.getErrorsList();
    }
    catch(Exception e) {
      throw e;
    }
    finally {
      if(document != null) {
        try {
          document.close();
        }catch(Exception ignored) {}
      }
      parser = null;
      document = null;
      result = null;
    }
    return errorList.size() > 0 ? true : false;
  }
}
flycash :

When I add these options:

-XX:+HeapDumpOnOutOfMemoryError -Xmx3550m -Xms3550m -Xmn2g 

It failed again. And I use VisualVM to analysis the dump heap file. I found something interesting.

heap dump file And most of char[]'s content is:

char[] content And I find the code in

//org.apache.pdfbox.preflight.process.reflect.SinglePageValidationProcess#validateGroupTransparency
    protected void validateGroupTransparency(PreflightContext context, PDPage page) throws ValidationException
    {
        COSBase baseGroup = page.getCOSObject().getItem(XOBJECT_DICTIONARY_KEY_GROUP);
        COSDictionary groupDictionary = COSUtils.getAsDictionary(baseGroup, context.getDocument().getDocument());
        if (groupDictionary != null)
        {
            String sVal = groupDictionary.getNameAsString(COSName.S);
            if (XOBJECT_DICTIONARY_VALUE_S_TRANSPARENCY.equals(sVal))
            {
                context.addValidationError(new ValidationError(ERROR_GRAPHIC_TRANSPARENCY_GROUP,
                        "Group has a transparency S entry or the S entry is null"));
            }
        }
    }

It create a ValidationError object, but the constructor is:

public ValidationError(String errorCode, String details, Throwable cause)
        {
            this(errorCode);
            if (details != null)
            {
                StringBuilder sb = new StringBuilder(this.details.length() + details.length() + 2);
                sb.append(this.details).append(", ").append(details);
                this.details = sb.toString();
            }
            this.cause = cause;
            t = new Exception();
        }

You can see that, once there is a error, it create the ValidationError and create a StringBuilder.

So, you have three ways to solve the problem:

  1. You can extend you heap size. 4G is not enough, try 16G or more.
  2. Don't use PDFBox library.
  3. Change the PDFBox source code.
    public ValidationError(String errorCode, String details, Throwable cause)
    {
        this(errorCode);
        if (details != null)
        {
            String key = errorCode + details;
            if (commonDetailMap.containsKey(key)) {
                this.details = commonDetailMap.get(key);
            } else {
                StringBuilder sb = new StringBuilder(this.details.length() + details.length() + 2);
                sb.append(this.details).append(", ").append(details);
                this.details = sb.toString();
                commonDetailMap.put(key, this.details);
            }

        }
        this.cause = cause;
        t = new Exception();
    }

I think using a Map to avoid creating too may StringBuilder would work. But the Map would be too large if the error code and details are multivalued.

So, the another way to change the source code is:

    public ValidationError(String errorCode, String details, Throwable cause)
    {
        this(errorCode);
        if (details != null)
        {
            StringBuilder sb = new StringBuilder(this.details.length() + details.length() + 2);
            sb.append(this.details).append(", ").append(details);
            // invoke intern
            this.details = sb.toString().intern();
        }
        this.cause = cause;
        t = new Exception();
    }

The intern() is:

Returns a canonical representation for the string object.

I think that using intern() is better.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=152892&siteId=1