Python3, 5 lines of code, Chatxxx can perform a series of operations such as rotating, extracting, and merging PDF files. After reading this article, the 80-year-old grandmother can walk without supporting the wall.

1 Introduction

Little Diaosi : Brother Yu, what have you been up to lately?
Xiaoyu : Recently? How close do you mean?
Little Diaosi : Just these few days?
Xiaoyu : These days I am moving bricks.
Little Diaosi : How about a few days ago?
Xiaoyu : A few days ago, during my May Day holiday, I was also moving bricks.
Little Diaosi : Brother Yu, you...
insert image description here

Xiaoyu : I really do.
Little Diaosi : Then change the subject, how is ChatGPT doing?
Xiaoyu : ChatGPT is not all big companies are doing it. ChatGPT can answer what you want to know.
Xiao Diaosi : Brother Yu, you...!
Xiaoyu : This is the strength of ChatGPT, and it is true.
Little Diaosi : Okay, then I want to extract the content of the PDF document.
Xiaoyu : This article: " Python3, 9-line batch extraction of specified content of PDF files, this kind of operation is guaranteed to be loved by everyone... "
Xiao Diaosi : I am a RMB player, and I want a more advanced one.
Xiaoyu : Here... Let me take a look at
Xiao Diaosi : What are you looking at?
Xiaoyu : See how much is left in your account, should it be recharged?
Little Diaosi : Brother Yu, you...!
Xiaoyu : Full, giving you the luxury you want...
insert image description here

Little Diaosi : Hehe... full of...

2. Code combat

2.1 Principle

When it comes to ChatPDF, the first reaction of most students is, sorry, I don't know.
However, when it comes to ChatGPT, you may say, I must know this, the product of OPenAI company, and it is so popular now, how can I not know.

Now that you know ChatGPT, ChatPDF is not difficult to understand.
In fact, ChatPDF is a derivative of ChatGPT.

Xiao Diaosi : Since it is a derivative product of ChatGPT, what is the working principle?
Xiaoyu : The working principle is not difficult, there are only 2 steps to put the elephant in the refrigerator.

insert image description here

  • 1. ChatPDF reads the contents of PDF files and converts them into text (can be .txt) format;
  • 2. ChatPDF cleans and standardizes the extracted text content, such as: segment, sentence, etc.;
  • 3. Use OpenAI's Embeddings API to convert each segment into a vector, which will encode the semantics in the text for easy comparison with the question's vector;
  • 4. Use OpenAI's Embeddings API to convert the question into a vector and compare with each segment's vector to find the most similar segment. This similarity calculation can be performed using common methods such as cosine similarity;
  • 5. Use the most similar segment and question found as a prompt, call OpenAI's Completion API, let ChatGPT learn the segment content, and then answer the corresponding question;
  • 6. The answer generated by ChatGPT will be returned to the user to complete a query.

Xiao Diaosi : Unexpectedly, the implementation process is really easy.
Xiaoyu : The big river bends to the east, Niu Niu rushes forward~~

2.2 Installation

In the previous chapter, we knew what ChatPDF is and the operation process.
Next, we have to actually operate it.

Of course, when it comes to third-party libraries, the old rules, installation starts.

pip install chatpdf

Then just wait for the installation.

For other installation methods, see these two articles directly:

Because chatPDF takes a while to install (I wouldn't say the process is " short ").

2.2 Examples

After the installation is complete, let's see what chatPDF can do.

2.2.1 Creating PDF files

code example

# -*- coding:utf-8 -*-
# @Time   : 2023-05-06
# @Author : Carl_DJ

'''
实现功能:
    使用chatPDF的基本方法,创建PDF文件
'''

'--------->创建PDF文件<---------'
from chatpdf import ChatPDF

#文件名字
file_name = './data/TestDemo.pdf'
pdf = ChatPDF()
#添加页数
pdf.add_page()
#设置字体
pdf.set_font("Arial", size=12)
#设置内容
pdf.cell(200,10,txt='Hello, Python')
#输出内容
pdf.output(file_name)


2.2.2 Rotating PDF files

code example

# -*- coding:utf-8 -*-
# @Time   : 2023-05-06
# @Author : Carl_DJ

'''
实现功能:
    使用chatPDF的基本方法,实现旋转PDF文件内容
'''

'--------->旋转PDF文件内容<---------'
from chatpdf import rotate_pages

#pdf源文件
pdf_file = './data/input.pdf'

#输出的文件
output_file = './data/output.pdf'

#旋转的页码
pages = [1, 3]

#旋转角度设置
rotation_angle = 270

rotate_pages(pdf_file, output_file, pages, rotation_angle)

2.2.3 Split PDF file

code example

# -*- coding:utf-8 -*-
# @Time   : 2023-05-06
# @Author : Carl_DJ

'''
实现功能:
    使用chatPDF的基本方法,实现拆分PDF文件内容
'''

'--------->拆分PDF文件<---------'
from chatpdf import split

#需要拆分的pdf源文件
pdf_file = 'input_demo.pdf'

#拆分后的pdf保存的文件夹
output_folder = './data/output'

split(pdf_file,output_folder)

2.2.4 Merge PDF files

code example

# -*- coding:utf-8 -*-
# @Time   : 2023-05-06
# @Author : Carl_DJ

'''
实现功能:
    使用chatPDF的基本方法,实现合并PDF文件内容
'''

'--------->合并PDF文件<---------'
#获取所有文件
file1 = './data/demo1.pdf'
file2 = './data/demo2.pdf'
file3 = './data/demo3.pdf'

#列表展示所有需要合并的pdf文件
pdf_file_list = [file1,file2,file3]
#合并后输出文件名称
output_file = 'output_demo.pdf'

merge(pdf_file_list,output_file)

2.2.5 Extract PDF file content

code example

# -*- coding:utf-8 -*-
# @Time   : 2023-05-06
# @Author : Carl_DJ

'''
实现功能:
    使用chatPDF的基本方法,实现提取PDF文件内容
'''

'--------->提取PDF文件<---------'
from chatpdf import extract_pages

#待提取的pdf源文件
pdf_file = 'input.pdf'

#输出提取的pdf文件内容
output_file = './data/output.pdf'
#提取的源文件的页码
pages = [1,3,5,7,10]

extract_pages(pdf_file, output_file, pages)

Little Diaosi : Brother Yu, I remember you also wrote a blog post about extracting the contents of PDF documents.
Xiaoyu : Well, your memory is quite good, and you did write it, which is the following article.

Of course, regarding the operation of PDF documents, you can also read other blog posts of Xiaoyu:

3. Summary

Seeing this, the introduction of the ChatPDF library is complete.
In fact, the functions of the ChatPDF library are not only the functions I mentioned, but also:

  • PDF file encryption ;
  • PDF file decryption ;

At the moment when ChatGPT is exploding, as a technical er, we more or less need to know some AI knowledge.
Even if it is white ~ whoring ~ the number of times ChatGPT is used, it is also possible.
Of course, as a small fish, I will definitely not do (bai piao) things, but the pace of learning has never stopped.

I am a small fish :

  • CSDN blog expert ;
  • Aliyun expert blogger ;
  • 51CTO blog expert ;
  • 51 Certified Instructor ;
  • Certified gold interviewer ;
  • workplace training planner ;

Follow me and let you learn more and more interesting Python knowledge.

Guess you like

Origin blog.csdn.net/wuyoudeyuer/article/details/129474277