documents4j converts PDF files in Word documents, xlsx, and other formats

1. Introduction to documents4j

document4j is a Java tool library for document format conversion. It implements the entire conversion process by using the native application that supports conversion from a specified file format to a target file format.
document4j implements the adaptation function of Microsoft Word and Excel, and can convert docx files to pdf files, and in this process, there will be no distortions that occur in non-Microsoft file conversions.

document4j provides a simple API and has two concrete implementations:

Local Strategy
In the implementation strategy of the local version, document4j delegates the conversion task of the specified file to the corresponding application in the machine. Therefore, in order to ensure normal operation, this machine needs to pre-install software that supports conversion in the background, such as Microsoft Word / Excel.
document4j provides a simple and easy-to-use mechanism that allows users to register custom converters, and at the same time integrates specific implementation details with Microsoft Word/Excel.

Remote Strategy
In the implementation strategy of the remote version, document4j provides the document conversion work to the remote server through REST-API for conversion. In this process, the requester sends the file and related format conversion information to the server. After the conversion is completed, the converted file will be transmitted back through the response.

For document4j users, these implementations are completely transparent. Therefore, users can completely adopt the local version of the implementation strategy when developing and testing locally, and switch to the remote implementation mode completely transparently in the production environment. Therefore, the background conversion function is also easier to simulate.

2. Simple use of documents4j

	<!--转pdf-->
		<dependency>
			<groupId>com.documents4j</groupId>
			<artifactId>documents4j-local</artifactId>
			<version>1.0.3</version>
		</dependency>
		<dependency>
			<groupId>com.documents4j</groupId>
			<artifactId>documents4j-transformer-msoffice-word</artifactId>
			<version>1.0.3</version>
		</dependency>

		<dependency>
			<groupId>com.itextpdf</groupId>
			<artifactId>itextpdf</artifactId>
			<version>5.5.10</version>
		</dependency>
/**
     * docx、xlsx、转pdf
     * @param docPath
     * @param pdfPath
     * @return
     */
    public static boolean docTopdf(String docPath, String pdfPath) {
    
    

        File inputWord = new File(docPath);
        File outputFile = new File(pdfPath);
        try {
    
    
            InputStream docxInputStream = new FileInputStream(inputWord);
            OutputStream outputStream = new FileOutputStream(outputFile);
            IConverter converter = LocalConverter.builder().build();
            String fileTyle=docPath.substring(docPath.lastIndexOf("."),docPath.length());//获取文件类型
            if(".docx".equals(fileTyle)){
    
    
                converter.convert(docxInputStream).as(DocumentType.DOCX).to(outputStream).as(DocumentType.PDF).execute();
            }else if(".doc".equals(fileTyle)){
    
    
                converter.convert(docxInputStream).as(DocumentType.DOC).to(outputStream).as(DocumentType.PDF).execute();
            }else if(".xls".equals(fileTyle)){
    
    
                converter.convert(docxInputStream).as(DocumentType.XLS).to(outputStream).as(DocumentType.PDF).execute();
            }else if(".xlsx".equals(fileTyle)){
    
    
                converter.convert(docxInputStream).as(DocumentType.XLSX).to(outputStream).as(DocumentType.PDF).execute();
            }
            docxInputStream.close();
            outputStream.close();
            inputWord.delete();
            System.out.println("pdf转换成功");
            return true;
        } catch (Exception e) {
    
    
            e.printStackTrace();
            return false;
        }
    }

3. Use error: document4j framework use problem-java.util.concurrent.ExecutionException: Could not complete conversion

Preface
When using document4j to call office to convert files in other formats into PDF, there are errors such as

java.util.concurrent.ExecutionException: Could not complete conversion
at com.documents4j.job.FailedConversionFuture.get(FailedConversionFuture.java:35)
… Caused by: com.documents4j.throwables.ConversionInputException: The input file
seems to be corrupt
at com.documents4j.util.Reaction$ConversionInputExceptionBuilder.make(Reaction.java:159)

This error occurs because, if we use the java program as a window service, because the office does not provide the context of the windows service, the input stream will be interrupted when calling.

The solution
is learned from the official document document

documents4j might malfunction when run as a Windows service together
with MS Office conversion. Note that MS Office does not officially
support execution in a service context. When run as a service, MS
Office is always started with MS Window’s local service account which
does not configure a desktop. However, MS Office expects a desktop to
exist in order to run properly. Without such a desktop configuration,
MS Office will start up correctly but fail to read any input file. In
order to allow MS Office to run in a service context, there are two
possible approaches of which the first approach is more recommended:

On a 32-bit system, create the folder
C:\Windows\System32\config\systemprofile\Desktop. On a 64-bit system,
create the folder C:\Windows\SysWOW64\config\systemprofile\Desktop.
Further information can be found on MSDN. You can manipulate MS
Window’s registry such that MS Office applications are run with
another account than the local service account. This approach is
documented on MSDN. Note that this breaks MS Window’s sandbox model
and imposes additional security threats to the machine that runs MS
Office.

When the application runs in the context of window service, there may be problems when using the conversion function of document4j at the same time. MS Office does not officially support the use in the context. Therefore, when our application is run by the service mode, if we use document4j, an error will be reported.
For a typical box, when jenkins automatically deploys, execute java -jar test.jar to start the test application, and the application runs in the winows context.
There are 2 solutions, I haven't tried the second one, try the first one and it works, only the first one is introduced.

In win32-bit system, create the Desktop folder under C:\Windows\System32\config\systemprofile\ In
win64-bit system, create the Desktop folder under C:\Windows\SysWOW64\config\systemprofile\

After the creation is complete, restart the computer.

protocol

documents4j is released under the Apache 2.0 open source agreement.

Official website: http://documents4j.com
Open source address: https://github.com/documents4j/documents4j
Other ways to achieve PDF conversion: https://www.jb51.net/article/254043.htm

Guess you like

Origin blog.csdn.net/lijie0213/article/details/127796317