Write code in Python, easily handle the workload of the day, my colleagues directly call: good fellow

Hello everyone, I am Erhei.

A few days ago, a reader said that he had to organize thousands of documents recently and his head was bald. I don’t know if I can solve it with Python. Let’s take a look, and you can also think about it.

The specific content has been desensitized because it involves file privacy.

This is probably the case, there are multiple meeting notices under one folder (this article uses 7 documents as an example)
Insert picture description here

The opening format of each notification is basically similar, as shown below
Insert picture description here

Now it is necessary to extract the four key information of study time, study content, study form, and moderator in each meeting document and organize them into an Excel table:
Insert picture description here

In his real needs, nearly 1,000 meeting notices have been accumulated in four years (so many meetings in four years are also very powerful...), and the workload of opening files one by one and recording them in Excel is too large.

Good guy, isn't this kind of repetitive and boring job an automated job that is very suitable for Python? I don’t allow my fans yet!

Let's take a look at how to solve this problem with Python, mainly involving:

openpyxl Write Excel file python-docx Read Word file glob Obtain file path in batch

In order to simplify the above requirements, there are a total of 7 meeting notice files that need to be obtained in this article, which are named meeting notice 1.docx meeting notice 2.docx... meeting notice 7.docx, and they are stored in the Notice folder. The output target Excel file is named Meeting_temp.xlsx

1. Basic logic

Before writing the code, it is clear that the complete problem needs to be implemented in several small steps. From the requirements, we can roughly divide the code into the following steps:

"Get all the files in the Notice folder of the meeting; parse each Word file, get the four required information, and output to Excel; save the Excel file"

With logic, there is an idea of ​​writing code. The first step can be completed by the glob library, and the next two steps are the interactive collaboration between the python-docx library for Word and the openpyxl library for Excel.

We have talked about these two libraries. If you are not familiar with it, you must read the following article first!

Python-docx operation Word detailed explanation openpyxl operation Excel detailed explanation

Two, code implementation

First import the required libraries:
Insert picture description here

Read the template Excel into the program:
Insert picture description here

Before writing any batch code, it is recommended to write the code for a single operation, so we first complete the parsing of the meeting notice 1.docx file to ensure that it is correct. Now the structure of the document and the location of key information are not clear. You can first output Word in units of paragraph Paragraph to observe:
Insert picture description here
Insert picture description here

The text layout of the document is relatively clear, basically a sentence corresponds to a paragraph, and the required information can be clarified simply by judging the first few words of each sentence (each paragraph):
Insert picture description here

The acquisition of learning content is quite special, unlike the other three information, all in one sentence, and the keywords are the first few words:
Insert picture description here

It can be seen that the four words "learning content" and the actual content are scattered in different sentences. Here is a simple strategy:

"Create an empty list to store, and then traverse each paragraph to judge, if one character is a number and the second character is a Chinese comma ","
, get and store it in the list. Finally, recombine the elements in the list into a long string To:

Insert picture description here

After finishing parsing the Word file, you need to output the content in an Excel file.

Simply put, it is to combine several elements obtained by the above code into a list, and write it into the Excel file by the method of sheet.append(list):
Insert picture description here

After parsing a single file, use glob to change it to get all the files in the folder, and create a loop to analyze one by one to complete this requirement. Of course, remember to save the Excel file at the end.

The complete code is as follows. The
Insert picture description here
Insert picture description here
Insert picture description here
core is only 30 lines of code, and it takes only three seconds to complete!

The following is my collection and sorting in recent years. The whole is organized around [software testing]. The main content includes: python automation test exclusive video, Python automation details, a full set of interview questions and other knowledge content.
Insert picture description here
For software testing friends, it should be the most comprehensive and complete interview preparation warehouse. Many friends review these contents and get offers from major manufacturers such as BATJ. This warehouse has also helped many software testing learners. , I hope it can help you too.

Pay attention to the WeChat public account: Programmer Erhei, you can get Python automated test resources

Guess you like

Origin blog.csdn.net/m0_52650621/article/details/112997233