Python batch processing of Excel tables
Preamble
The boss has been getting more and more extreme recently. He was about to get off work and sent me hundreds of forms for me to combine the content into one form.
Fortunately, I know Python, and I can get it done in minutes. If I change it to someone who doesn't know Python, I won't be able to work overtime until dawn the next day~
Such a useful skill must be shared with everyone. Without further ado, let's get started!
Ready to work
We need to prepare the table data first. Brothers who can crawl can crawl a little by themselves. If not, you can find me to get the data directly.
Get it on the left side of the computer, and get it at the bottom of the mobile phone
I only show the data in the table
here, so only five tables are used. Today, we merge the municipal level into the provincial level.
The idea of this article
- Summarize all excel in the current folder to Guangdong Province.xlsx
- Add a new field city, the content of the field is the city where the store is located, and this field is placed at the front;
- All data with a star rating of star_0 are not required
- As long as three fields in a piece of data are empty fields, the whole piece of data is not needed;
- Remove the $ symbol from the price
Code
All the code is shared with everyone, we don't like to hide it.
import glob
import openpyxl
workbook = openpyxl.Workbook()
sheet_total = workbook.active
sheet_total.append(['城市', '门店名称', '星级', '星级得分', '点评总数', '人均消费', '口味', '环境', '服务', '链接网址', '分类', '商圈', '详细地址', '推荐菜'])
def count_none(line):
"""返回空内容的数据"""
count = 0
for d in line:
if not d:
count += 1
return count
filenames = glob.glob('*/*.xlsx')
for filename in filenames:
# print(filename)
city = filename.split('.')[0].split('\\')[-1]
workbook_temp = openpyxl.load_workbook(filename)
sheet = workbook_temp.active
for row in sheet.iter_rows(min_row=2, min_col=1, max_col=sheet.max_column, max_row=sheet.max_row):
row_data = [col.value for col in row]
if row_data[1] == 'star_0':
continue
# 定义一个方法判断空字段的数量
if count_none(row_data) >= 3:
continue
# 去掉平均价格中的 ¥
if row_data[4]:
row_data[4] = row_data[4].strip('¥')
row_data.insert(0, city)
# print(row_data)
sheet_total.append(row_data)
# break # 调试只处理一个
workbook.save('广东省.xlsx')
Effect
Fresh out of the oven, very fresh.
I have done filtering, otherwise it will all be displayed in one place.
As you can see, the data has been successfully merged into one table.
If you like it, remember to like and collect it~
Follow me to share more technical dry goods
. Taking the code directly is equivalent to whoring, like and collecting is the truth...
Your support is the driving force for my update!