I wrote a program that opens a Microsoft Word document for reading and writing.
This program reads the Word paragraphs and tables and replaces the placeholders. After the running, the program saves the document at the same file path as it reads.
If I use this option to opening the document I got an NullPointerException:
String filePath = "...";
XWPFDocument doc = new XWPFDocument(OPCPackage.open(filePath));
// Replace paragraphs.
doc.write(new FileOutputStream(filePath));
doc.close();
Here is the stacktrace:
java.lang.NullPointerException
at org.apache.poi.POIXMLDocument.getProperties(POIXMLDocument.java:147)
at org.apache.poi.POIXMLDocument.write(POIXMLDocument.java:225)
Caused by: java.lang.NullPointerException
at org.apache.poi.openxml4j.util.ZipSecureFile$ThresholdInputStream.read(ZipSecureFile.java:211)
at org.apache.xerces.impl.XMLEntityManager$RewindableInputStream.readAndBuffer(Unknown Source)
at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
at org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
at org.apache.poi.util.DocumentHelper.readDocument(DocumentHelper.java:140)
at org.apache.poi.POIXMLTypeLoader.parse(POIXMLTypeLoader.java:163)
at org.openxmlformats.schemas.officeDocument.x2006.extendedProperties.PropertiesDocument$Factory.parse(Unknown Source)
at org.apache.poi.POIXMLProperties.<init>(POIXMLProperties.java:78)
at org.apache.poi.POIXMLDocument.getProperties(POIXMLDocument.java:145)
And if I use this option:
String filePath = "...";
InputStream fis = new FileInputStream(filePath);
XWPFDocument doc = new XWPFDocument(OPCPackage.open(fis));
// Replace paragraphs.
doc.write(new FileOutputStream(filePath));
doc.close();
Works correctly. I tried to save the document in another path, and this situation works correctly.
So I didn't understand why I got an error message when I use open(String path)
method for opening the Word document.
What is the difference between OPCPackage.open(InputStream in) and OPCPackage.open(String path) methods? And why I've got NullPointerException?
The public static OPCPackage open(java.io.InputStream in) states:
Open a package. Note - uses quite a bit more memory than open(String), which doesn't need to hold the whole zip file in memory, and can take advantage of native methods
So what does that mean? public static OPCPackage open(java.lang.String path) as well as public static OPCPackage open(java.io.File file)
are opening the ZipPackage file system directly from the *.docx
file. That uses less memory than public static OPCPackage open(java.io.InputStream in)
what reads the ZIP
file system into memory first using a InputStream
. But on other hand, the *.docx
file also is open now and each try to write something into that opened file must lead to errors (multiple different, not always a NPE, for me it is java.io.EOFException: Unexpected end of ZLIB input stream
using apache poi 4.0.1
[1]) as long as the writing action not really only writes into the opened ZIP
file system but into the opened *.docx
file.
[1]: Just tested, I get exactly your NPE using apache poi 3.17
on Windows 10. Ubuntu Linux simply crashes.
Conclusion:
Opening a OPCPackage
(ZipPackage
) from a File
directly and then writing to another File
works. Opening OPCPackage
from a File
directly and then writing to the same File
does not work.
This is true for all Office Open XML file formats which are handled using ZipPackage
in apache poi
.
To get the advantage of using less memory while creating the XWPFDocument
because of using a File
instead of an InputStream
and nevertheless be able writing into the same file, we could using a temporary copy of the file as follows:
import java.io.FileOutputStream;
import java.io.File;
import java.nio.file.Paths;
import java.nio.file.Files;
import java.nio.file.StandardCopyOption;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.openxml4j.opc.OPCPackage;
public class WordReadAndReWrite {
public static void main(String[] args) throws Exception {
String filePath = "WordDocument.docx";
String tmpFilePath = "~$WordDocument.docx";
File file = Files.copy(Paths.get(filePath), Paths.get(tmpFilePath), StandardCopyOption.REPLACE_EXISTING).toFile();
XWPFDocument doc = new XWPFDocument(OPCPackage.open(file));
// Replace paragraphs.
FileOutputStream out = new FileOutputStream(filePath);
doc.write(out);
out.close();
doc.close();
Files.deleteIfExists(Paths.get(tmpFilePath));
}
}
Of course that has the disadvantage of using additional file storage, even if temporary.