Python conversion HTML to PDF program collection

Converting HTML web pages to PDF is a common requirement for many people. On the browser, we can directly print out web pages as PDF through the browser's "print" function.

But if there are multiple pages, it is not easy to handle.

Article Directory

Many people learn python and don't know where to start.
Many people learn python and after mastering the basic grammar, they don't know where to find cases to get started.
Many people who have done case studies do not know how to learn more advanced knowledge.
For these three types of people, I will provide you with a good learning platform, free to receive video tutorials, e-books, and course source code! ??¤
QQ group: 232030553

Binary software

There are many software and tools for converting HTML to PDF on the Internet. The more famous ones are Carelib and wkhtmltopdf.

whtmltopdf

wkhtmltopdf is really an excellent HTML conversion PDF tool. It uses Qt's WebKit rendering engine to render HTML documents and export them as PDF documents or images.

The function is very complete, but because the rendering engine used is Qt's WebKit, it can't provide support for ES6 JavaScript code, causing some HTML pages written in ES6 to render no actual effect, causing Mr. Zhou to finally give up it .

Carelib

Carelib is an e-book management software, which provides various document conversion tools, so you can use its e-book conversion tools to realize the conversion of HTMl to PDF.

These are binary software for the desktop environment, if you want to use in Python, using either  Popen() method call these binary software commands, or use some third-party package module, such as:  pdfkit ,  pypandoc and so on, these third-party modules integrated by calling the above Binary software encapsulates some interfaces that are convenient for Python to call.

Pure Python library implementation

Although the Python third-party modules introduced above can perform the conversion from HTML to PDF very well, they all need to install other binary software on the computer. Many small partners don't like this calling method.

Does not depend on the implementation of binary software, there are the following solutions:

xhtml2pdf

This is an HTML to PDF conversion module based on Python modules such as ReportLab, html5lib, PyPDF2, etc. Can well support HTML5, CSS2.1 and some CSS3 syntax.

Because it is developed based on the Report Lab module, its support for Chinese may be problematic in some environments. And due to the changes of the developers, some gaps have appeared in the function of the module. But it is still a great HTML to PDF module.

weasyprint

This is a visual rendering engine for HTML and CSS, which can export HTML documents to PDF files with printing standards.

The xhtml2pdf module has also recommended this module to convert HTML to PDF.

This module is very powerful and effective, but it has too many dependencies:

Mr. Zhou has not successfully installed it on a Windows computer so far!

Browser solution

In the above two solutions, the controllability of the binary program is slightly insufficient, while the rendering analysis implemented by pure Python is not friendly in terms of function and dependency.

Dealing with the above two schemes, we can also use the third way to convert HTMl to PDF. That is achieved with the help of the browser kernel of Web automated testing and the Web engine of Qt for Python.

Browser kernel for web automation

Small partners who use Python often use Selenium and pyppeteer, two web automated testing modules, for data collection and web automated testing.

These two modules are used to drive a real browser to operate web pages. Based on this, we can call the API interface related to printing in the browser to realize the function of converting HTML to PDF.

For example, in pyppeteer, you can open an HTML document in the following example, and then convert it to a PDF document:

Qt's web engine

In Qt5, Qt uses the new Chromium kernel instead of the old WebKit as the rendering engine for the Web. Makes modern browser development in Qt.

With the help of Qt's Python implementation (PyQt5 series and PySide2 series), we can directly call the web engine-related interfaces in Qt.

Wherein the  QtWebEngineWidgets sub-module  QWebEngineView() class provides  printToPdf a method for the printed page will be a PDF document, so based on this, we can use the HTML converter for PySide2 PyQt5 or PDF, example is shown below:

At last

In the above, Mr. Zhou introduced 3 schemes for converting HTML documents to PDF documents in Python. Each scheme has its own advantages and disadvantages. Correctly assessing your needs and choosing the appropriate scheme can also make up for its shortcomings. .

Guess you like

Origin blog.csdn.net/Python_sn/article/details/111899658