Summary of JAVA identification of electronic invoice problems

The java analysis electronic invoice I wrote before went online. After going online, there was an online problem, and the invoice could not be parsed. I made a regularization of the problem, for reference only!

1. Pdfbox Introduction

pdfbox is an Apache open source tool, which can operate on pdf, such as title, image conversion is also included in it, and it also has functions such as adding, deleting, editing pdf pages, and extracting text. For the basic grammar, please refer to the official website. Here we will not explain the basics, but only list some questions.

官网:https://pdfbox.apache.org/

2. Dependence

We may encounter problems such as missing conversions during the conversion process, please add the following dependencies:

(1) ERROR: Cannot read JBIG2 image: jbig2-imageio is not installed

<dependency>
    <groupId>org.apache.pdfbox</groupId>
    <artifactId>jbig2-imageio</artifactId>
    <version>3.0.2</version>
</dependency>

 (2) Abnormal jpegerror

<dependency>
    <groupId>com.twelvemonkeys.imageio</groupId>
    <artifactId>imageio-jpeg</artifactId>
    <version>3.4.2</version>
</dependency>

(3) Cannot read JPEG2000 image: Java Advanced Imaging (JAI) Image I/O Tools are not installed

<dependency>
    <groupId>com.github.jai-imageio</groupId>
    <artifactId>jai-imageio-core</artifactId>
    <version>1.4.0</version>
</dependency>

<dependency>
    <groupId>com.github.jai-imageio</groupId>
    <artifactId>jai-imageio-jpeg2000</artifactId>
    <version>1.3.0</version>
</dependency>

3. Frequently Asked Questions

  • (1) Why is my image turning so slow?
    If your target image is a PNG image, this kind of image is not easy to be distorted. If the PDF itself is large in size, or the image is bright in color, it may consume 1~2 seconds (see the file itself) when using ImageIO.write, and the CPU will be consumed very high.

  • (2) How to solve the above problems?
    The official provides two methods, renderImage and renderImageWithDPI. The second parameter of the former is a floating-point type, which can be magnified. The latter seems to be converted by pixel. Due to our company's relatively high requirements, the former is magnified by 5 to 7 times, and the effect of setting the DPI to 350 to 400 is not as good as the latter, and the latter is faster.

  • (3) In the production environment, why is it different from my locally converted pictures?
    The production environment generally uses a Linux host. Of course, the Linux host also has its own fonts. Our general development environment uses the font files that come with Windows.

    Our approach is to directly move the font (C:\Windows\Fonts) file under Windows to Linux (feasible).

#cd /usr/share/fonts/   // 进入系统自带的字体目录
#mkdir myfonts  // myfonts 是你自己随便取得文件夹名字
#将字体文件拷贝到这个文件夹下,在cd /usr/share/fonts/目录下执行以下命令
#mkfontscale   
#mkfontdir
#fc-cache -fv           //更新字体缓存
#source /etc/profile    // 执行以下命令让字体生效
#fc-list    // 查看系统中所有得字体,可用于测试是否安装字体成功

https://www.jianshu.com/p/c85017f8577a

Guess you like

Origin blog.csdn.net/Alex_81D/article/details/130090077