How does Python batch merge 70 doc and docx files

Table of contents

1. Presentation of the question

2. Algorithm analysis

3. Code display

4. Matters needing attention


A friend sent more than 70 numerically named doc and docx files, and asked me if there is any vba code that can merge these files in order of numerical size. I tried to use chatgpt, and a bunch of vba codes didn't work, so I found out Python, and it was good to try. Merge doc, docx files

2. Algorithm analysis

Before the conversion is implemented, file traversal, format conversion, and final merging into the new file are required. The following are the algorithm steps

  1. Traverse the current directory files.  Use os.listdir() to traverse.
  2. Format conversion. Use the changeOffice module to achieve batch conversion of doc and docx formats.
  3. Merge docx. Use the Document in python-docx to read and append to the previous file.

3. Code display

After testing, I compiled the following code:

from docx import Document
from changeOffice import Change
import os,time
Change(".").doc2docx() #把当前目录下的doc批量转化为docx
time.sleep(3) #设置停顿时间,以防出错
files = sorted([file for file in os.listdir(".") if file.endswith(".docx")],key=lambda x:int(x[:-5])) #文件遍历和排序
doc1 = Document(files[0]) # 读取第一个文档
for file in files[1:]:
    doc = Document(file) # 读取第二个文档 
    for element in doc.element.body:  #拷贝文件中的信息,# 追加第二个文档内容到第一个文档末尾
        doc1.element.body.append(element)
doc1.save('merged_file.docx') # 保存新的合并文件

The advantage of the above code is that it can ensure that the format information such as paragraphs and fonts of the merged file remains unchanged, and the conversion efficiency is quite high. You can test it and report back if you have any questions.

4. Matters needing attention

  1. VBA and Python have their own advantages in office automation. The biggest advantage of Python is that you can use ready-made modules to quickly realize the desired functions without having to start from scratch, which simplifies the programming process, and the code runs fast.
  2. Before running the above code, ensure that the Python environment is installed. At the same time, the python-docx and changeOffice packages must also be installed with pip. When the program is running, all files should be kept closed, and the code files should be placed in the directory of all word files.
  3. This program is only suitable for merging word files whose filenames are numbers, because such files can be merged in order, and other non-numeric filenames can modify the code as appropriate.
  4. Be sure to back up the original file before running the program, because changeOffice will delete the original file by default after converting the file format.

Guess you like

Origin blog.csdn.net/henanlion/article/details/131060262