PyPDF2 of python: detailed explanation of the example of manipulating PDF documents

PyPDF2 is a Python library for manipulating PDF documents. It provides a series of functions that allow us to read, modify and create PDF files. This article will introduce in detail the usage examples of the PyPDF2 library, including operations such as reading document information, extracting text content, merging and splitting documents, and adding watermarks.

First, we need to install the PyPDF2 library. It can be installed using pip with the following command:

pip install PyPDF2

After the installation is complete, we can start using the PyPDF2 library. Here is sample code for some commonly used functions:

1. Read PDF document information:

import PyPDF2

# 打开PDF文件
with open('example.pdf', 'rb') as file:
    # 创建一个PdfFileReader对象
    pdf = PyPDF2.PdfFileReader(file)

    # 获取PDF文件的页数
    num_pages = pdf.numPages
    print("页数:", num_pages)

    # 获取PDF文件的元数据
    metadata = pdf.getDocumentInfo()
    print("标题:", metadata.title)
    print("作者:", metadata.author)
    print("创建时间:", metadata.created)

2. Extract text content:

import PyPDF2

# 打开PDF文件
with open('example.pdf', 'rb') as file:
    # 创建一个PdfFileReader对象
    pdf = PyPDF2.PdfFileReader(file)

    # 提取第一页的文本内容
    page = pdf.getPage(0)
    text = page.extractText()
    print(text)

3. Merge PDF documents:

import PyPDF2

# 创建一个PdfFileMerger对象
merger = PyPDF2.PdfFileMerger()

# 打开要合并的PDF文件
file1 = open('document1.pdf', 'rb')
file2 = open('document2.pdf', 'rb')

# 添加要合并的PDF文件
merger.append(file1)
merger.append(file2)

# 合并PDF文件并保存
merger.write('merged_document.pdf')

# 关闭文件
file1.close()
file2.close()

4. Split PDF documents:

import PyPDF2

# 打开PDF文件
with open('example.pdf', 'rb') as file:
    # 创建一个PdfFileReader对象
    pdf = PyPDF2.PdfFileReader(file)

    # 拆分文档,将每一页保存到单独的文件中
    for page_num in range(pdf.numPages):
        output_pdf = PyPDF2.PdfFileWriter()
        output_pdf.addPage(pdf.getPage(page_num))

        with open(f'page{page_num + 1}.pdf', 'wb') as output_file:
            output_pdf.write(output_file)

5. Add watermark:

import PyPDF2

# 打开PDF文件
with open('example.pdf', 'rb') as file:
    # 创建一个PdfFileReader对象
    pdf = PyPDF2.PdfFileReader(file)

    # 创建一个PdfFileWriter对象
    output_pdf = PyPDF2.PdfFileWriter()

    # 打开水印文件
    with open('watermark.pdf', 'rb') as watermark_file:
        # 创建一个PdfFileReader对象
        watermark = PyPDF2.PdfFileReader(watermark_file)

        # 将水印添加到每一页
        for page_num in range(pdf.numPages):
            page = pdf.getPage(page_num)
            page.mergePage(watermark.getPage(0))
            output_pdf.addPage(page)

    # 保存带有水印的PDF文件
    with open('watermarked_document.pdf', 'wb') as output_file:
        output_pdf.write(output_file)

Through the above sample code, we can find that the PyPDF2 library provides a series of methods for processing PDF documents. Whether it is reading document information, extracting text content, or performing operations such as merging, splitting, and adding watermarks, the PyPDF2 library can well meet our needs. Hope this detailed example is helpful for your study!

Guess you like

Origin blog.csdn.net/naer_chongya/article/details/131457091