pdf text recognition - read by line

need

Use pdfbox to recognize pdf text, because pdf is unstructured, resulting in disordered content during recognition. If you need to recognize text, you can recognize it by line, which is convenient for comparing content.

Introduce maven dependency: the latest dependency as of 23 years

        <dependency>
            <groupId>org.apache.pdfbox</groupId>
            <artifactId>pdfbox</artifactId

Guess you like

Origin blog.csdn.net/zhijiesmile/article/details/130815178