Python automation tutorial (3): Automatically generate PPT files Part 1 (dry goods)

 Series of tutorials:

Python automation tutorial (1) overview, the first Excel automation

Python Automation Tutorial (2): Excel Automation: Using the pandas library

Python automation tutorial (3): Automatically generate PPT files Part 1

Python automation tutorial (4): Automatically generate PPT files Part 2

Python automation tutorial (5): Automatically generate Word files

Python automation tutorial (6): PDF file processing

4. Automatically generate PPT files 

In automated office, it is often necessary to process office files in batches.

For example: spend a lot of time writing PPTs every month, and it will be fine if you can automatically generate PPTs.

Please click here for the source code and file download of this article

1. Introduction to office library

    I wrote an office library in python, which is used for office automation. The functions are very strong, including: PPT automatic generation, PPT to long picture, PPT with voice playback, Word automatic generation, Excel data processing, image processing, video processing, Converting office documents to PDF, PDF encryption and decryption, watermarking, etc. are all practical dry goods.

The method of use is extremely simple, and most functions only need one or two lines of code.

1.1. The practical effect of automatically generating PPT with one line of code

import office 

# 以 template.pptx 为模板,创建 output.pptx 文件, 填入datafile.xlsx文件数据, 保存
office.open_file("output.pptx", "template.pptx").fill('datafile.xlsx').save()

The code is simple, but the generated PPT effect is not simple. As shown in the picture:

 

This tutorial will share the source code, routines, and usage of the office library with you.

1.2. Use PIP to install the office library:

Please install via pip from the command line:

pip install jojo-office

The installation name of the office library is jojo-office

When using, just import office.

import office

Office library dependencies include: python-docx, openpyxl, python-pptx, PyPDF4, reportlab, playsound, etc., which will be installed automatically during installation.

If you need to import and export DataFrame, it depends on the pandas library, please install it as needed

pip install pandas -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com

The office library only supports the new version of office files (extensions .docx, .xlsx, .pptx), and does not support the old versions of office files before office2003 (extensions .doc, .xls, .ppt).

2. The principle of automatically generating PPT files

The method to automatically generate a PowerPoint file is: first write a template PowerPoint file, copy the template to create a new file, and then fill in the data. Filling in different data will generate different PPT files, so as to realize batch generation, for example: use the Excel report file data of each month to generate the PPT of the current month.

The template PPT file is an ordinary PPT file, and the content and format can be written as required. Just write variables where you want to fill in the data. Variables are written as follows:

For example: template PowerPoint file template.pptx, the text content in the slide is as follows:

Among them: curly braces { xxx } contains text called variables. {name}, {age} are variables.
The process of generating the PPT is to fill in the data, and the variables will be replaced with the corresponding values. The name variable will be replaced with the value of name. The age variable will be replaced with the value of age.

Note: Use curly brackets in English instead of curly brackets with full-width characters, otherwise the variable will not be recognized.

3. Use the Excel file as the data source and fill in the PowerPoint template file

3.1 Characters

The Excel file is datafile.xlsx, the content of cell B2 of Sheet1 is 'Peter', and the content of cell C2 is 18.

The template PowerPoint file template1.pptx is written as follows:

The variable {Sheet1!B2} indicates that the data comes from the B2 cell of the Sheet1 worksheet of the Excel file.
The variable {Sheet1!C2} indicates that the data comes from cell C2 of the Sheet1 worksheet of the Excel file.

The python program to generate PPT is as follows:

import office

# 以 template1.pptx 为模板,创建 output.pptx 新文件
# 如果 output.pptx 文件已存在,则将覆盖原文件
ppt = office.open_file("output.pptx", template="template1.pptx")

# 从 datafile.xlsx 文件中取数据, 填入, 保存
ppt.fill('datafile.xlsx').save() 

The above program can also be written in one line, such as:

office.open_file("output.pptx", "template1.pptx").fill('datafile.xlsx').save()

After the program runs, an output.pptx file is generated, the content of which is as follows:

 summary:

 Writing a template file is writing variables in place. Variables start with a { sign and end with a } sign.

 The variable name pointing to the Excel data is the reference address of the worksheet and the cell, such as: {Sheet1!B2}.

To generate a PPT is to fill in the data. The same template generates different PPTs, just replace the data.

When filling in the data, the formatting (including font, size, position, color) is not changed.

The template PPT file can contain multiple slides, and variables can be written on each page, and the number is not limited.

3.2 Form

Create a table in the template PPT, each column of the table defines a variable, and multiple rows of data can be filled into the PPT table.

The Sheet1 worksheet of datafile.xlsx has a table

In the template file template2.pptx, a table with 4 rows and 2 columns is drawn, and variables are written in the first row of each column, which is written as follows:

 The python program is as follows:

import office


ppt = office.open_file("output.pptx", template="template2.pptx")

# 从 datafile.xlsx 文件中取数据, 填入, 保存
ppt.fill('datafile.xlsx').save() 

After the program runs, an output.pptx file is generated, the content of which is as follows:

 It can be seen that the table data in Excel has been filled in the PPT table.

Note: The number of data rows filled in the table depends on the number of rows in the table in the PPT template. If the number of rows in the Excel table is greater than the number of rows in the PPT table, the subsequent data will not be filled.

3.3 Charts

Create diagrams in PPT templates. Each chart has a data table, and each column of the chart data table is defined as a variable. When the PPT is generated, the data will be filled in the data table of the PPT chart, and the PPT chart will be updated.

Open the template file template3.pptx in the PowerPoint program and create a histogram.

 In the PowerPoint program, right-click the chart and select the menu: "Edit Data", and you can see the data table of the chart.

 Modify the first row of the data table, and modify the first row of each column to a variable, for example: {Commodity Sales 1!B1} means that the data in this column comes from column B of the Commodity Sales worksheet in the Excel data file. The modified data table is as follows:


The python program is as follows:

import office

# 以 template3.pptx 为模板,创建 output.pptx 文件,  填入datafile.xlsx 文件数据, 保存
office.open_file("output.pptx", "template3.pptx").fill('datafile.xlsx').save()

After the program runs, an output.pptx file is generated, the content of which is as follows:

 It can be seen that the PPT chart has been updated.

 Open the file output.pptx just generated in PowerPoint, right-click the chart, select the menu: "Edit Data", you 
can see that the data table of the PPT chart has been changed to the corresponding data table of the Excel file, as follows:

In fact, the processing method of the office library is to rebuild the chart after updating the data table of the PPT chart.
Note: The chart function of the office library currently supports 2D charts, but does not support 3D charts. Therefore, charts in templates cannot be of the 3D chart type.

3.4 Insert picture, video, audio

If the data is a picture file name, you can insert the picture file in the PPT.

A text box is drawn in the template file template4.pptx, fill in the text, write the variable {@Sheet1!D2}, and add a '@' character before the variable name to indicate that it is a special variable. When it is a picture filename, the picture file will be inserted. The template is as shown in the figure:

 The python program is as follows:

import office

# 以 template4.pptx 为模板,创建 output.pptx 文件,  填入datafile.xlsx 文件数据, 保存
office.open_file("output.pptx", "template4.pptx").fill('datafile.xlsx').save()

Note: The value of cell Sheet1!D2 in datafile.xlsx is "peter.jpg", which is a picture file name. The file name does not specify a path, so this image file should be placed in the current directory. Of course, there is no problem with using absolute paths for file names.


After the program runs, the output.pptx file is generated, and its content is as follows. It can be seen that a picture peter.jpg is inserted into the PPT document, and the size and position of the picture are consistent with the text box where the variable is located.

 Similarly, .mp3, .mp4 and other audio and video files can also be inserted into PPT documents.

3.5 Generate PPT comprehensive report

Combining the above functions of text, tables, charts, and pictures, a comprehensive PPT report can be generated.

The data file is in datafile.xlsx, and there are multiple worksheets in this file: commodity sales, customers, inventory, summary.

The template file template5.pptx is a typical PPT report, including tables, charts, text, pictures, etc. Its content is as follows:

 

The python program is as follows:

import office

# 以 template5.pptx 为模板,创建 report.pptx 文件,  填入datafile.xlsx 文件数据, 保存
office.open_file("report.pptx", "template5.pptx").fill('datafile.xlsx').save()

After the program runs, an output.pptx file is generated, the content of which is as follows:

  

3.6 Save PPT as a long picture 

A long picture is to turn each page of the PPT into a picture and link it into a long picture.

Note: The function of saving the file as a long picture requires Microsoft Powerpoint or WPS Office to be installed on the machine

Open the pptx file, save it as a jpg file, and then save it as a long picture.

The python program is as follows:

# 打开 report.pptx, 存盘为 一张长图 (注:存盘文件名为一个jpg文件,就是保存为一张长图)
office.open_file("report.pptx").save("long.jpg")

 If the long image needs to be watermarked, add the watermark parameter when calling save(), the procedure is as follows:

# 打开 report.pptx, 存盘为 一张长图 (注:存盘文件名为一个jpg文件,就是保存为一张长图)
office.open_file("report.pptx").save("long.jpg", watermark="商业秘密,注意保管")

A beautiful long picture is produced.

3.6 Save PPT as PDF

# 打开 report.pptx, 存盘为PDF, 加水印
office.open_file("report.pptx").save("report.pdf", watermark="商业秘密,注意保管")

To save as PDF, just name the save() file as .pdf. watermark is the watermark text.

Note: The function of converting PPTX to PDF requires Microsoft Powerpoint program or WPS Office to be installed on this machine

4. Play PPT in a loop and play voice synchronously

Play the PPT, turn the page automatically every few seconds, jump back to the first page after playing the last page, and play in a loop

The python program is as follows:

# 循环播放PPT,打开 report.pptx, 播放PPT, 每隔3秒换到下一页
office.open_file("report.pptx").play(3)

After the program runs, it will start the Microsoft Powerpoint program (or WPS Office), automatically enter the full-screen mode, display the PPT page by page, and play in a loop. The above code is to change a page every 3 seconds. During playback, press the ESC key to interrupt playback and exit the program .

Note: The function of playing PPT requires the installation of Microsoft Powerpoint program or WPS Office

If the jump time of each page is different, you can write the interval time as an array.

office.open_file("report.pptx").play([1, 3, 2, 1])

The meaning of the interval time array [1, 3, 2, 1] is: the first page stops for 1 second, the second page stops for 3 seconds, the third page stops for 2 seconds, and the fourth page stops for 1 second.

If there are more pages, just write the array long.

When playing PPT, the voice file can be played synchronously when playing to a certain page.

# 循环播放PPT, 在第2页时,播放语音文件 1.wav
office.open_file("report.pptx").play([
    1,
    [3, "1.wav"],
    3,
    [1, "2.wav"]
])

In the interval time array, write the position of the page where the audio needs to be played as an array, in the form of: [interval seconds, audio file name].

As above, when playing to the second page, play 1.wav and stay for 3 seconds. When playing to the fourth page, play 2.wav and stay for 1 second. 

You can also write the playback interval as a text file. For example: play.txt, the content is as follows:

1
3 1.wav
3
1 2.wav

The format is: Each line represents the playback interval seconds of a page of PPT + space + voice file name.

Then, in the python program, just refer to this file.

office.open_file("report.pptx").play("play.txt")

The effect is the same as previously expressed with an array.

Application scenario of playing PPT with voice: For example: When an exhibition is held, PPT + voice will be played automatically. Forget the narrator.

summary:

1. The office library provides a powerful PPT generation function.

2. Write a template PPT file and write variables. Fill in the data to generate PPT.

3, The data can be placed in an Excel file. The variable is {worksheet! cell} of Excel.

4. PPT can be saved as a long picture. Can play with voice.

Please click here for the source code and file download of this article

Sequel:

The office library has many other functions, which will be discussed in the next class.

The office library is still under development, please forgive me for occasional bugs, or provide improvements.

If you are interested in in-depth research, you can see the source code of office.py.

Guess you like

Origin blog.csdn.net/c80486/article/details/126434547