Apache Poi Word document opening gives NullPointerException

Zsolt Bujdosó :

I wrote a program that opens a Microsoft Word document for reading and writing.

This program reads the Word paragraphs and tables and replaces the placeholders. After the running, the program saves the document at the same file path as it reads.

If I use this option to opening the document I got an NullPointerException:

String filePath = "...";
XWPFDocument doc = new XWPFDocument(OPCPackage.open(filePath));
// Replace paragraphs.
doc.write(new FileOutputStream(filePath));
doc.close();

Here is the stacktrace:

java.lang.NullPointerException
    at org.apache.poi.POIXMLDocument.getProperties(POIXMLDocument.java:147)
    at org.apache.poi.POIXMLDocument.write(POIXMLDocument.java:225)
Caused by: java.lang.NullPointerException
    at org.apache.poi.openxml4j.util.ZipSecureFile$ThresholdInputStream.read(ZipSecureFile.java:211)
    at org.apache.xerces.impl.XMLEntityManager$RewindableInputStream.readAndBuffer(Unknown Source)
    at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
    at org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
    at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
    at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
    at org.apache.poi.util.DocumentHelper.readDocument(DocumentHelper.java:140)
    at org.apache.poi.POIXMLTypeLoader.parse(POIXMLTypeLoader.java:163)
    at org.openxmlformats.schemas.officeDocument.x2006.extendedProperties.PropertiesDocument$Factory.parse(Unknown Source)
    at org.apache.poi.POIXMLProperties.<init>(POIXMLProperties.java:78)
    at org.apache.poi.POIXMLDocument.getProperties(POIXMLDocument.java:145)

And if I use this option:

String filePath = "...";
InputStream fis = new FileInputStream(filePath);
XWPFDocument doc = new XWPFDocument(OPCPackage.open(fis));
// Replace paragraphs.
doc.write(new FileOutputStream(filePath));
doc.close();

Works correctly. I tried to save the document in another path, and this situation works correctly.

So I didn't understand why I got an error message when I use open(String path) method for opening the Word document.

What is the difference between OPCPackage.open(InputStream in) and OPCPackage.open(String path) methods? And why I've got NullPointerException?

Axel Richter :

The public static OPCPackage open(java.io.InputStream in) states:

Open a package. Note - uses quite a bit more memory than open(String), which doesn't need to hold the whole zip file in memory, and can take advantage of native methods

So what does that mean? public static OPCPackage open(java.lang.String path) as well as public static OPCPackage open(java.io.File file) are opening the ZipPackage file system directly from the *.docx file. That uses less memory than public static OPCPackage open(java.io.InputStream in) what reads the ZIP file system into memory first using a InputStream. But on other hand, the *.docx file also is open now and each try to write something into that opened file must lead to errors (multiple different, not always a NPE, for me it is java.io.EOFException: Unexpected end of ZLIB input stream using apache poi 4.0.1 [1]) as long as the writing action not really only writes into the opened ZIP file system but into the opened *.docx file.

[1]: Just tested, I get exactly your NPE using apache poi 3.17 on Windows 10. Ubuntu Linux simply crashes.

Conclusion:

Opening a OPCPackage (ZipPackage) from a File directly and then writing to another File works. Opening OPCPackage from a File directly and then writing to the same File does not work.

This is true for all Office Open XML file formats which are handled using ZipPackage in apache poi.

To get the advantage of using less memory while creating the XWPFDocument because of using a File instead of an InputStream and nevertheless be able writing into the same file, we could using a temporary copy of the file as follows:

import java.io.FileOutputStream;
import java.io.File;

import java.nio.file.Paths;
import java.nio.file.Files;
import java.nio.file.StandardCopyOption;

import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.openxml4j.opc.OPCPackage;

public class WordReadAndReWrite {

 public static void main(String[] args) throws Exception {

  String filePath = "WordDocument.docx";
  String tmpFilePath = "~$WordDocument.docx";

  File file = Files.copy(Paths.get(filePath), Paths.get(tmpFilePath), StandardCopyOption.REPLACE_EXISTING).toFile();

  XWPFDocument doc = new XWPFDocument(OPCPackage.open(file));

  // Replace paragraphs.

  FileOutputStream out = new FileOutputStream(filePath); 
  doc.write(out);
  out.close();
  doc.close();

  Files.deleteIfExists(Paths.get(tmpFilePath));
 }

}

Of course that has the disadvantage of using additional file storage, even if temporary.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=160368&siteId=1