Python office automation: 27 lines of code realize batch aggregation and merging of the contents of multiple Excel tables into one table

Python batch processing of Excel tables

Preamble

The boss has been getting more and more extreme recently. He was about to get off work and sent me hundreds of forms for me to combine the content into one form.

Fortunately, I know Python, and I can get it done in minutes. If I change it to someone who doesn't know Python, I won't be able to work overtime until dawn the next day~


Such a useful skill must be shared with everyone. Without further ado, let's get started!

Ready to work

We need to prepare the table data first. Brothers who can crawl can crawl a little by themselves. If not, you can find me to get the data directly.

Get it on the left side of the computer, and get it at the bottom of the mobile phone

I only show the data in the table
insert image description herehere, so only five tables are used. Today, we merge the municipal level into the provincial level.

insert image description here

The idea of ​​this article

  1. Summarize all excel in the current folder to Guangdong Province.xlsx
  2. Add a new field city, the content of the field is the city where the store is located, and this field is placed at the front;
  3. All data with a star rating of star_0 are not required
  4. As long as three fields in a piece of data are empty fields, the whole piece of data is not needed;
  5. Remove the $ symbol from the price

Code

All the code is shared with everyone, we don't like to hide it.

import glob
import openpyxl 

workbook = openpyxl.Workbook()
sheet_total = workbook.active
sheet_total.append(['城市', '门店名称', '星级', '星级得分', '点评总数', '人均消费', '口味', '环境', '服务', '链接网址', '分类', '商圈', '详细地址', '推荐菜'])


def count_none(line):
    """返回空内容的数据"""
    count = 0
    for d in line:
        if not d:
            count += 1
    return count


filenames = glob.glob('*/*.xlsx')
for filename in filenames:
    # print(filename)
    city = filename.split('.')[0].split('\\')[-1]
    workbook_temp = openpyxl.load_workbook(filename)
    sheet = workbook_temp.active
    for row in sheet.iter_rows(min_row=2, min_col=1, max_col=sheet.max_column, max_row=sheet.max_row):
        row_data = [col.value for col in row]
        if row_data[1] == 'star_0':
            continue

        # 定义一个方法判断空字段的数量
        if count_none(row_data) >= 3:
            continue

        # 去掉平均价格中的 ¥
        if row_data[4]:
            row_data[4] = row_data[4].strip('¥')
        row_data.insert(0, city)
        # print(row_data)
        sheet_total.append(row_data)
    # break  # 调试只处理一个

workbook.save('广东省.xlsx')

Effect

Fresh out of the oven, very fresh.
insert image description hereI have done filtering, otherwise it will all be displayed in one place.
As you can see, the data has been successfully merged into one table.
insert image description here
If you like it, remember to like and collect it~
Follow me to share more technical dry goods
. Taking the code directly is equivalent to whoring, like and collecting is the truth...
Your support is the driving force for my update!

Guess you like

Origin blog.csdn.net/fei347795790/article/details/124149165