Python merge different types of file content (word, excel, txt)

Extract file contents


import docx
def merge_without_format(docx_files: list):
    '''
    只获取内容进行合并
    '''
    # 遍历每个文件
    for docx_file in sorted(docx_files):
        another_doc = Document(docx_file)
        # 获取每个文件的所有“段落”
        paras = another_doc.paragraphs
        # 获取所有段落的文字内容
        # paras_content = [para.text for para in paras]
        for para in paras:
            # 为新的word文件创建一个新段落
            newpar = doc.add_paragraph('')
            # 将提取的内容写入新的文本段落中
            newpar.add_run(para.text)

    # 所有文件合并完成后在指定路径进行保存
    doc.save(Path(word_files_path, 'new.docx'))
        

# 调用函数
merge_without_format(files)

After we merge Word and Txt and save it to a new Word, there will be a problem that the font size in Txt is not consistent with the original file. We can use the python-docx extension library to add formatting to the text in the Txt file. If the Word file before merging is imitation Song font, and has underline and red font, how do we unify the font, style and color after merging Txt? We can use the following code.


def add_content_mode1(content):
    '''
    增加内容
    '''
    para = doc.add_paragraph().add_run(content)
    # 设置字体格式
    para.font.name = '仿宋'
    # 设置下划线
    para.font.underline = True
    # 设置颜色
    para.font.color.rgb = RGBColor(255,128,128)  

The picture formats we often see are .jpg, .png, .gif, etc. Since these formats are widely used and the formats are not encrypted by commercial software, the add_picture function of the python-docx library can realize the function of inserting pictures into Word . code show as below:


from docx import Document
from docx import shared

doc = Document()
# 按英寸设置宽度,添加图片
doc.add_picture('test.jpg', width=shared.Inches(1)) 

In order to give you a better understanding of how to merge Word and Excel files, I use an example of using Excel and Word to make invitations in batches to explain to you.


def generat_invitation():
    '''
    生成邀请函文件
    '''
    doc = Document(invitation)
    # 取出每一段
    for para in doc.paragraphs:
        for key, value in replace_content.items():
            if key in para.text:
                # 逐个关键字进行替换
                para.text = para.text.replace(key, value)

    file_name = PurePath(invitation_path).with_name(replace_content['<姓名>']).with_suffix('.docx')
    doc.save(file_name)

First fill in the name and gender in each row in Excel into the Word file; then fill in the current date into the Word file; and finally save as a file according to the name.

Guess you like

Origin blog.csdn.net/david2000999/article/details/121503176