Apache Tika 1.24 released, Tika is a content extraction tool set (a toolkit for text extracting). It integrates the POI and Pdfbox, and extraction work provides a unified interface for text. Secondly, Tika also provides a convenient extension API, used to enrich its support for third-party file formats.
The main updates are as follows:
- Drew Noakes update metadata extractor
- Enabling optional extraction structure tags in a PDF (alpha level)
- --extract mode Tika application now to STDOUT
- Add the optional parser for the PDF Preflight
- Some improvement zip format based detection
- The upgrade metadata extractor to 2.13.0
- Upgrade to the POI 4.1.2
- XMP extracted from the PSD file
- XMLProfiler added in the PDF as an optional parser to configure XFA and XMP
- PDF is extracted from DCT filter depends on the image inline
- Upgrading to PDFBox 2.0.19
- Fixed ASM parser configuration error
- Upgrade to Java-libpst 0.9.3
- Fixed XLIFF12Parser failure of ToXMLHandler
Update Description: https://downloads.apache.org/tika/CHANGES-1.24.txt