Remember the experience of being pitted by wkhtmltopdf

	今天不讲代码,给大家讲讲我在工作中的一次填坑的难忘经历。

In a previous project, there was a need to generate a pdf file. Ask the company boss how to generate a pdf. The boss recommends 2 ways:

  • Itextsharp
  • wkhtmltopdf
    Because the content of the pdf contains 3 tables with variable number of rows (requires the back end to obtain data), and a variable number of pictures (the same size area is filled with one photo, and 2 sheets are divided into half. …). In this way, if you directly manipulate the pdf and write content into the pdf through the code, there is too much logic processing that needs to be done, and if the pdf content needs to be adjusted later, it will be a very painful thing to modify the back-end code.
    The wkhtmltopdf recommended by the big guys has the function of directly converting a page corresponding to a URL into a pdf. I'm overjoyed for this. Thus, the following realization ideas are determined.
wkhtmltopdf访问这个url转成pdf
pdf内容做成html
pdf保存到服务端

This is a perfect solution: 1.
The separation of the pdf content and the generated pdf file 2. When the pdf
content needs to be modified, we only need to modify the implementation of the corresponding html page (I believe this will surprise the programmer)
3. Later, if the content of the exported pdf is incorrect (the data is wrong, the location is wrong, the picture is not...), we can first open the html page through the browser directly to check whether the data itself is missing. Quickly locate the content error or export error.
The next step is to roll up your sleeves and work hard.
Coding road: It can be said that it is a green light. There are many online tutorials. It is easy to generate a pdf
test according to the URL . The road: company internal testing and customer test environment testing are still green light.
Product launch: all kinds of weird problems are coming.

Pit 1: The function cannot be used directly

Because the function of exporting PDF was only used for public documents at the end of the month, it was not used in the early stage of use by the user, and the problem was not exposed in time.
At the end of the month, the customer found that the exported PDF could not be opened, indicating that the file was damaged.
Various troubleshooting began when the problem occurred:

  • Suspected that there is a problem with the html content: Open the html page and find that everything is normal.
  • There is a problem with the code of converting url to pdf file: I pulled out this piece of code separately and executed it locally and found it was perfect without reporting any errors.
  • Enter dumbfounded mode, the html content is fine, and the code is fine. Where is the problem? Unexpectedly, I finally began to consider whether it was a server problem.
  • Url to pdf test: I wrote the code to convert url to pdf with a console application, and put it on the server for execution. When it was executed, an error was reported: msvcp140.dll does not exist.
  • Found the problem: The server in the official environment has not installed VS , but the local development and user test server are all necessary files for VS installed, so everything is normal.
  • Solve the problem: The user does not allow us to install vs on the server, so we can only make up for what is missing. If the dll is missing, I will provide the corresponding dll. msvcp140.dll does the solution
  • I found out that msvcp140.dll is missing by luck. Because in the process of using my program, the dll is missing and the exe call fails. The whole process did not throw any errors. Yes, no errors were thrown. It's not that I didn't handle it. If it hadn't been tested with the control application and it was exposed in the console, I would not have found this problem.

2020-7-24 Supplement: Through the understanding of the official website, this pit appears essentially because wkhtmltopdf uses the Qt WebKit rendering engine to render HTML to PDF, and QT is a C++ graphical user interface application development framework, that is, the use of wkhtmltopdf requires C++ Environmental support.

Pit 2: The table data and image data dynamically loaded by ajax requests are not displayed in the pdf, and the rest of the data is normal.

Problem discovery and process: After the
pit was resolved, I conducted a pdf export test and found that everything was normal (but there was no picture for the data that needed to be exported at the time), and there was nothing wrong with each other for a while.
But 2 months later, the customer suddenly reported that the data and pictures of the detailed rows in the exported pdf were not available.
(The exported data should contain these contents, and the missing data happens to be the data loaded after my ajax request)

  • Confirm whether the html content itself lacks these contents: visit the html page separately and find that the data is loaded.
  • Is the ajax request processing too long and the page is not fully loaded and the pdf is exported? By monitoring the ajax request on the html page, it is found that the time is very short, only tens of milliseconds. So theoretically there is no request that is too long
  • Although the page monitoring ajax request time is not long, in order to avoid trouble, I still modify the asynchronous request to synchronous, and set the -javascript-delay- parameter to 6000 milliseconds, and wait for js to execute for 6 seconds.
  • It still doesn't work. In order to completely rule out whether it is caused by js loading, I set the js delay wait to 20 seconds, and found that it still doesn't work. So giving up completely is the problem of content loss caused by js
  • Finally came to the point of troubleshooting the server:
  • The first step: Confirm whether there is a problem with the code: I still use the console application, use the same URL to convert the back-end code of pdf, to the same URL (users can access from the external network, so the URL is directly the URL of the external network). When I transferred a pdf locally, it was found to be normal, and the detailed tables and pictures were loaded.
  • Step 2: Whether it reports an error again but the system does not capture it, the control application is put on the client's official environment server for execution and it is found that the exported content is missing.
  • At this time, I initially wondered whether it was because the official environment was not installed vs. which caused the other required environment to cause the local environment to be OK, but the official server could not. (In my mind, if it is because of VS, I should persuade the customer to install VS directly)
  • So I used the same windows server2012 server as the customer. After installing vs2015, I executed the general processing procedure on my server, and found that the exported data was still lost, and the guess confirmation failed because of not being installed.
  • So far I have no choice. Installing vs is my last lifesaver, and it is gone now.
  • By asking for teaching directors, the seniors introduced another method to directly convert html to pdf files on the front end through jspdf, so they had to display the html page first when the customer wanted to export, and then let the customer perform the export operation. What was frustrating was the local Google browser used during the test, but the client was only allowed to use the IE browser, and jspdf was not compatible on the IE browser. It took 8 hours to solve the compatibility problem and failed to get it, and finally gave up.
  • Fortunately, I saw a blog post in CSDN that I was pitted by htmltopdf and found that it was similar to me. Finally, my problem was solved through the solution he introduced. The solution is as follows: The
    Insert picture description hereoriginal link
    core directly converts the previous URL to pdf , Add a step in the middle, first open the url page to load, after the loading is complete, save the loaded page as a static page, and then convert the static page to pdf.
    The code is as follows:
    Get dynamic resources
document.getElementById("downloadPdf").onclick = function () {
    
    
            
            var body = document.getElementById("iftest").contentDocument.body;
            var head = document.getElementById("iftest").contentDocument.head;
            //拼接成html
            var htmlStr = "<html>" + head.innerHTML + body.innerHTML + "</html>";
            //传递到后端
            $.ajax({
    
    
                type: "post",
                url: "../../../Report/Service/MaintainReportData.ashx",
                async: false,//同步
                data: {
    
    
                    Method: 'ExceclPDF2',
                    HtmlStr: encodeURI(htmlStr),//注意这里必须要进行转码,负责Ajax会报错
                },
                success: function (result) {
    
    
                    //获取返回参数
                    var data = eval('(' + result + ')');
                    if (data.result == 0) {
    
    
                        //(文件名,文件路径)
                        var filepath = data.FilePath;
                        var filename = data.FileName;
                        if (filepath != undefined && filepath != "") {
    
    
                            //请求下载pdf的url
                            window.location.href = "../../../Report/Service/MaintainReportData.ashx?Method=DownLoadFile&FileName=" + "&FilePath=" + filepath;
                        }
                    }
                    else {
    
    
                        alert(data.msg);
                    }
                }
            });

        }

Convert the middle service layer to html

 string ssss = Convert.ToString(context.Request.Params["HtmlStr"]);
 string htmlStr = HttpUtility.UrlDecode(ssss);
 string htmlfileName = "维修报告书" + DateTime.Now.ToString("yyyyMMddHHmmss") + ".html";
 htmlSavPath = context.Request.MapPath("~/BXApp/IRep/ExportPDF/MaintainReport/HtmlFile/" + htmlfileName);
 //生成html静态文件
 CreateHtml(htmlStr, htmlSavPath);

 private void CreateHtml(string htmlStr, string saveHtmlFilePath)
 {
    
    
        using (FileStream file = File.Create(saveHtmlFilePath))
        {
    
    
            Encoding myEncoding = Encoding.GetEncoding("utf-8");
            byte[] myByte = myEncoding.GetBytes(htmlStr);
            file.Write(myByte, 0, myByte.Length);
        }
 }

Finally, I am very grateful to the seniors and bloggers for their help, and thank you for the road of moving bricks.

2020-7-24 Supplement that the execution data is lost on windows server 2012, and the win execution is not lost. The root cause may be that QT does not support the win2012 environment, and supports win2008, win7, win10, etc.

Guess you like

Origin blog.csdn.net/qq_39541254/article/details/107541497