Comparison and summary of the way python generates PDF

Background: With limited resources, a PDF error occurs asynchronously and there is insufficient memory.
Purpose: In the case of limited resources, it will generate a PDF concurrently, and consume less resources (resources: CPU and memory), and the speed of the concurrent PDF cannot be too slow, the generated PDF style should be rich, and the generated PDF content should be complete.
According to my research, the pdf generation methods (python) include reportlab, pdfkit, xhtml2pdf, django-easy-pdf.

1. Reportlab library
This library can draw various charts by itself, but it depends on the reportlab library. Secondly, font problems can be solved by introducing font files.
Simple example:


from reportlab.lib.styles import getSampleStyleSheet
from reportlab.platypus import Paragraph,SimpleDocTemplate
from reportlab.lib import  colors
#  字体
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont
pdfmetrics.registerFont(TTFont('song', STSONG.ttf))

Style=getSampleStyleSheet()

bt = Style['Normal']     #字体的样式
# bt.fontName='song'    #使用的字体
bt.fontSize=14            #字号
bt.wordWrap = 'CJK'    #该属性支持自动换行,'CJK'是中文模式换行,用于英文中会截断单词造成阅读困难,可改为'Normal'
bt.firstLineIndent = 32  #该属性支持第一行开头空格
bt.leading = 20             #该属性是设置行距

ct=Style['Normal']
# ct.fontName='song'
ct.fontSize=12
ct.alignment=1             #居中

ct.textColor = colors.red

t = Paragraph('hello',bt)
pdf=SimpleDocTemplate('ppff.pdf')
pdf.multiBuild([t])

Second, pdfkit relies on
this to generate pdf files based on url, html, and strings. It is indeed very useful, but it also has defects, font problems and style problems. The font needs to be set in the environment. It is too complex or advanced css styles. stand by.
Code sample:

#  pdfkit.from_string()  # 将字符串转成pdf文件,如果字符串是html代码,pdf也是识别的
#  pdfkit.from_file()  # 将文件转成pdf文件
#  pdfkit.from_url()  # 将网址的整个内容转成pdf文件

path_wk = BKAPP_WLS_PATH
config = pdfkit.configuration(wkhtmltopdf=path_wk)
try:
    result = pdfkit.from_url(instance.task_url,
                             path,
                             options={'encoding': "utf-8"},
                             configuration=config)
except Exception as e:
    logger.error('%s' % str(e))
    result = False

3. Both xhtml2pdf and django-easy-pdf
are template rendering methods, but many css styles cannot be supported, and there are also font and css style issues.
Code sample:

from xhtml2pdf import pisa

sourceHtml = 'http://www.baidu.com/'
outputFilename = "test.pdf"

def convertHtmlToPdf(sourceHtml, outputFilename):
    resultFile = open(outputFilename, "w+b")
    pisaStatus = pisa.CreatePDF(sourceHtml,resultFile)
    resultFile.close()
    return pisaStatus.err

if __name__=="__main__":
    pisa.showLogging()
    convertHtmlToPdf(sourceHtml, outputFilename)

Personal collation and comparison:

the way advantage Disadvantage Personal opinion
reportlab Basically everything involved in reports, inspection reports, etc. can be achieved Fonts need to import font files (14M size) The font problem is easy to solve. The generated pdf can have various data chart generation and even picture styles, but it is too cumbersome to use, which is equivalent to drawing with code by yourself (including the position of each element in the pdf), followed by a short time A large number of cpu and memory consumed by pdf generation have not been tested, but asynchronous or multi-process batch generation can be performed
pdfkit The generated pdf is beautiful, and secondly, it supports many styles and is easy to use Need to rely on the lightweight software wkhtmltopdf (it feels like the biggest flaw). If the project is deployed on docker, fonts need to be set in docker. Secondly, too advanced css styles are not supported, such as some styles of css3 If there are no special requirements, it can basically meet the needs. Through batch testing, pdfkit can generate pdfs in batches at the same time, but it consumes cpu and memory. When the generated PDF is larger, the content is more, the more cpu and memory are consumed. In the production environment, a large number of pdfs are generated in a short period of time. It is not recommended. After all, the stability of the environment is more than everything.
xhtml2pdf/django-easy-pdf Can generate pdf, but the aesthetics is average, the template rendering method used to generate pdf Font problems need to introduce font files, secondly, only some styles are supported, and there are few styles Simple pdf can be used, if a complex pdf is generated, it is estimated that it is difficult to support the style

Note:
pdfkit has been tested to generate a 1M PDF with 11 pages and various content charts. After testing, the startup of a WK consumes about 50M-70M of memory; it
generates a 4M PDF with 600 pages of PDF pages and various contents. Most of the charts are described in fonts. After testing, the startup of a WK consumes about 220M-270M of memory.
Therefore, when the size of the generated PDF is larger and the number of pages increases, I think it is better not to use and generate PDF. If you must use concurrency, it is best to control the number of PDF tasks at the code level according to consumption and the existing environment.

Guess you like

Origin blog.csdn.net/qq_42631707/article/details/111211318