The Python string generated PDF

Original link: https://www.jianshu.com/u/8f2987e2f9fb

The author of today's work, met a demand, and that is how Python string generated PDF. For example, the need to Python string 'This is a test file' generated as PDF, the PDF containing the text 'This is a test file'.
After some searching, I decided to use wkhtmltopdf this software, it can convert HTML to PDF. wkhtmltopdf access site at: https: //wkhtmltopdf.org/downloads.html, readers can download the corresponding file according to your system and install it. Installed wkhtmltopdf, we install this software third-party Python modules --pdfkit, installed as follows:

pip install pdfkit

We then discuss the following questions:

  • How Python string generated PDF;

  • How to generate a table in a PDF;

  • PDF generation to solve the problem of slow.

How to generate PDF Python string
  
Solutions to this problem is the use of the character string is embedded into the HTML code Python solution, attention needs to wrap a
label, the following sample code:

import pdfkit

# PDF中包含的文字
content = '这是一个测试文件。' + '<br>' + 'Hello from Python!'

html = '<html><head><meta charset="UTF-8"></head>' 
       '<body><div align="center"><p>%s</p></div></body></html>'%content

# 转换为PDF
pdfkit.from_string(html, './test.pdf')
-----------------------------------------------------
输出的结果如下:

Loading pages (1/6)
Counting pages (2/6)                                              
Resolving links (4/6)                                                      
Loading headers and footers (5/6)                                          
Printing pages (6/6)
Done

Test.pdf generated as follows:
Here Insert Picture Description
how to generate a table in a PDF

Next, we consider how to convert csv file to a table in the PDF, the idea or the use of HTML code. Iris.csv file example (part) as follows:
Here Insert Picture Description
convert to Form PDF csv file in Python code as follows:

import pdfkit

# 读取csv文件
with open('iris.csv', 'r') as f:
    lines = [_.strip() for _ in f.readlines()]
'''
遇到问题没人解答?
小编创建了一个Python学习交流QQ群:857662006 
寻找有志同道合的小伙伴,互帮互助,
群里还有不错的视频学习教程和PDF电子书!
'''
# 转化为html中的表格样式
td_width = 100
content = '<table width="%s" border="1" cellspacing="0px" style="border-collapse:collapse">' % (td_width*len(lines[0].split(',')))

for i in range(len(lines)):
    tr = '<tr>'+''.join(['<td width="%d">%s</td>'%(td_width, _) for _ in lines[i].split(',')])+'</tr>'
    content += tr

content += '</table>'

html = '<html><head><meta charset="UTF-8"></head>' 
       '<body><div align="center">%s</div></body></html>' % content

# 转换为PDF
pdfkit.from_string(html, './iris.pdf')

The resulting PDF file iris.pdf, in part as follows:
Here Insert Picture Description
to solve the problem of slow PDF generation

Generating pdfkit PDF files is convenient, but there is a relatively big disadvantage, that is generated PDF more slowly, where we can do a simple test, such as creating a PDF file 100 parts, inside the text as "This is the first * part test file!. " Python code is as follows:

import pdfkit
import time
'''
遇到问题没人解答?
小编创建了一个Python学习交流QQ群:857662006 
寻找有志同道合的小伙伴,互帮互助,
群里还有不错的视频学习教程和PDF电子书!
'''
start_time = time.time()

for i in range(100):
    content = '这是第%d份测试文件!'%(i+1)
    html = '<html><head><meta charset="UTF-8"></head>' 
           '<body><div align="center">%s</div></body></html>' % content

    # 转换为PDF
    pdfkit.from_string(html, './test/%s.pdf'%(i+1))

end_time = time.time()

print('一共耗时:%s 秒.' %(end_time-start_time))

In this program, generating a total of 100 parts of a PDF file takes about 192 seconds. Output:

......
Loading pages (1/6)
Counting pages (2/6)                                               
Resolving links (4/6)                                                       
Loading headers and footers (5/6)                                           
Printing pages (6/6)
Done                                                                      
一共耗时:191.9226369857788.

If you want to speed up generation, we can use multiple threads to achieve, mainly used concurrent.futures module, complete Python code is as follows:

import pdfkit
import time
from concurrent.futures import ThreadPoolExecutor, wait, ALL_COMPLETED

start_time = time.time()

# 函数: 生成PDF
def convert_2_pdf(i):
    content = '这是第%d份测试文件!'%(i+1)
    html = '<html><head><meta charset="UTF-8"></head>' 
           '<body><div align="center">%s</div></body></html>' % content

    # 转换为PDF
    pdfkit.from_string(html, './test/%s.pdf'%(i+1))


# 利用多线程生成PDF
executor = ThreadPoolExecutor(max_workers=10)  # 可以自己调整max_workers,即线程的个数
# submit()的参数: 第一个为函数, 之后为该函数的传入参数,允许有多个
future_tasks = [executor.submit(convert_2_pdf, i) for i in range(100)]
# 等待所有的线程完成,才进入后续的执行
wait(future_tasks, return_when=ALL_COMPLETED)

end_time = time.time()
print('一共耗时:%s 秒.' %(end_time-start_time))

In this program, generating 100 total parts of the PDF file takes about 41 seconds, significantly faster than many ~

Guess you like

Origin blog.csdn.net/qdPython/article/details/102744743