Several pits in Java to generate PDF

This article was first published on the personal WeChat public account "andyqian", looking forward to your attention~

foreword

  At work, there are a large number of electronic documents generated using Java, that is, PDF. In the previous article, I wrote about how to generate PDF through Java. I won't describe it here. I wrote this article mainly to record the mistakes I made in the process of using Java to generate PDFs, and the pits I stepped on.

memory overflow

  When using the generation scheme before, we used the "PD4ML" framework at the beginning, which is relatively small. But the downside: it's closed source. There is a memory leak when generating PDF files in batches. Whenever we analyze the memory reasons, we stop at the following code:

    private void createPdfByHtml(String html, File file) throws Exception{
        FileOutputStream fos = new FileOutputStream(file);  
        ...
        // 这一行
        pd4ml.render(sr, fos, new URL("http://"), "UTF-8");  
    }

Therefore, we had to consider switching to a third-party library. At present, we are using the open source library "iText", which is running stably so far.

(When choosing a third-party framework, we try to choose open source, because after a problem occurs, we can analyze the source code and even modify the source code for optimization).

font path

  The problem is this, in order to support Chinese fonts when we generate PDF, we need to load the specified font file through the path. The development environment is Windows, and the test/production environment is Linux. font on

/resources/fonts/

because:

  1. Get Thread.currentThread() as null value under Window

  2. Get this.class.getClass().getResource("/").getPath(); get null value under Linux

So when loading the path, we carried out compatibility processing by the following method.

  /**
     * 应用场景:
     * 1.在windows下,使用Thread.currentThread()获取路径时,出现空对象,导致不能使用
     * 2.在linux下,使用PdfUtils.class获取路径为null,
     * 获取字体路径
     * @return
     */
    private static String getFontPath(){
        String path="";
        // 1. 生产环境路径
        ClassLoader classLoader= Thread.currentThread().getContextClassLoader();
        URL url = (classLoader==null)?null:classLoader.getResource("/");
        String threadCurrentPath = (url==null)?"":url.getPath();
        // 2. 如果线程获取为null,则使用当前PdfUtils.class加载路径
        if(StringUtil.isBlank(threadCurrentPath)){
             path = PdfUtils.class.getClass().getResource("/").getPath();
        }
        // 3.拼接字体路径
        StringBuffer stringBuffer = new StringBuffer(path);
        stringBuffer.append("/fonts/SIMKAI.TTF");
        path = stringBuffer.toString();
        logger.info("getFontPath threadCurrentPath: {}  path: {}",threadCurrentPath,path);
        return path;
    }

From this aspect, it can also reflect a common phenomenon of programmers.

  1. Why is it good on my computer but not on your computer, it must be a problem with your computer.

  2. No, it's fine on my computer. Absolutely no problem.

Do you guys know this stalk? This is why many programmers use a system that is consistent with the production environment as a development machine.

Character Encoding

  The problem is this. It is caused by not specifying the encoding when converting the byte stream when generating the PDF file: as follows:

public static void createdPdfByItextHtml(String htmlContent,File file){
            ...
            try {
                inputStream= new ByteArrayInputStream(htmlContent.getBytes());
                outputStream = new FileOutputStream(file);
            ...

Part of the code is omitted here. At this time, htmlContent.getBytes()the method does not specify the default encoding. The specific reasons are described in detail in the article " A Preliminary Study of the Default Character Set of JDK Source Code ".

Uncommon words

  This pit is like this, because the font of the official document is: Kaiti_GB2312font. After using it for a while, we found that the Kaiti_GB2312font is not friendly to support rare characters.
E.g:

For example: Chen Yao, after being generated by Kaiti_GB2312, it can only be displayed as: Chen

The characters are not displayed properly. This problem is very serious. If the name in the electronic certificate is wrong, it means that this electronic certificate has no legal effect. After testing, we ended up using the Kaitifont, instead Kaiti_GB2312. (This pit is really not easy to find)

5. Enumeration transfer type
  If the Dubbo interface uses an enumeration as a parameter, if only the value of the enumeration is unilaterally updated. will cause serialization errors. That is to say:

  1. After the caller updates the enumeration, if the receiver does not update the enumeration at the same time, the receiver's parameter will be an empty object.

important point:  

  1. Between Dubbo service calls, try not to use the enum type.

Strict HTML semantic tags

  Friends who have used itextit should know. HTML content requires strict tags. That is, a start tag corresponds to an end tag. If it doesn't match, the build fails. (like <img src="www.baidu.com"/>) tags except.

wrong html:

<html>
  <head>
     <title></title>
  </head>
  <body>
    <span>Hello World!</span></span>
  </body>
</html>

The above html content has an additional </span>end tag, which is a non-strict HTM semantic tag.

After execution, there will be the following errors:

com.itextpdf.tool.xml.exceptions.RuntimeWorkerException: Invalid nested tag span found, expected closing tag body.
    at com.itextpdf.tool.xml.XMLWorker.endElement(XMLWorker.java:134)
    at com.itextpdf.tool.xml.parser.XMLParser.endElement(XMLParser.java:396)

finally

  Some systems seem easy. It's not easy to do. All kinds of weird problems may be encountered during implementation. But precisely, these strange problems. Gave us experience. I have always felt that I am a "unlucky" person when it comes to writing code, because of all kinds of strange problems that I can encounter.

Reply in the official account: [PDF], you can get the project. Remember to change the font!

Related Reading:

write picture description here

 Scan the code to follow and make progress together

Personal blog:  http://www.andyqian.com

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325475847&siteId=291194637