Teach you 4 ways to use Python batch to merge multiple Excel and multiple Sheets

I. Introduction

Hello, everyone, this is the program Yuanyouyou. I shared a Python automation article with you two days ago: teach you how to use Python to easily split Excel into multiple CSV files , and then a reader encountered a problem in the Python advanced exchange group. He has a lot of Excel tables, and he needs Combine these Excel files.

image

If you encounter difficulties in learning and want to find a python learning and communication environment, you can join our QQ group: 721195303, follow the editor, it will save a lot of time and reduce many problems encountered.

Of course, it is feasible to open the copy and paste one by one, but this method is time-consuming, laborious, and error-prone. Several files can also be processed manually. If there are dozens or even hundreds of them, you will be blinded. However, this problem is for Python. so easy, let's take a look!

 

2. Project goals

Use Python to realize the merge processing of multiple Excel and multiple Sheets.

 

3. Project preparation

Software: PyCharm

Required libraries: pandas, xlrd, os

 

Four, project analysis

1) How to choose the Excel file to be merged?

Use os to get all the Excel files to be merged.

2) How to select the sheets to be merged?

Use the xlrd library to read in Excel and get the sheet name to be merged.

3) How to merge?

Using the pandas library, open all the sheet names one by one in a loop, and use concat() to append and merge data.

4) How to save the file?

Use to_excel to save the data and get the final merged target file.

 

Five, project realization

1. The first step is to import the required libraries

import pandas as pd
import xlrd
import os

2. Select the Excel file to be merged in the second step

 #要合并文件路径
    path="D:/b/"
    #获取文件夹下所有EXCEL名
    xlsx_names = [x for x in os.listdir(path) if x.endswith(".xlsx")]

3. The third step is to select the Sheet to be merged

   # 获取第一个EXCEL名
    xlsx_names1 = xlsx_names[0]

    aa = path + xlsx_names1
    #打开第一个EXCEL
    first_file_fh=xlrd.open_workbook(aa)
    # 获取SHEET名
    first_file_sheet=first_file_fh.sheets()

4. The fourth step is to combine the contents of the Sheet in a loop

   #按SHEET名循环
    for sheet_name in sheet_names:
        df = None
        # 按EXCEL名循环
        for xlsx_name in xlsx_names:
            sheet_na = pd.ExcelFile(path + xlsx_name).sheet_names
            if sheet_name in sheet_na:
                #print(sheet_name)
                _df = pd.read_excel(path + xlsx_name, sheet_name=sheet_name,header=None)
                if df is None:
                    df = _df
                else:
                    df = pd.concat([df, _df], ignore_index=True)
            else:continue

5. The fifth step is to save the merged file

 
 
      df.to_excel(excel_writer=writer, sheet_name=sheet_name, encoding="utf-8", index=False)
        print(sheet_name + "  保存成功!共%d个,第%d个。" % (len(sheet_names),num))
        num += 1
    writer.save()
    writer.close()

 

Six, effect display

1. Excel data before processing:

image

2. Operation progress prompt:

3. The result after the merger:

 

Seven, summary

This article introduces how to use Python to merge multiple Excel files and multiple Sheets, which reduces a lot of the trouble of copy and paste, saves time and effort, is not easy to make mistakes, does not have much code, and the loop addition is a bit confusing. Think about it and you will understand , If you don’t understand, leave a message and ask questions at any time, let's learn and make progress together.

Friends who have ideas can also package the code in the article to make a small exe executable software, package it and send it to others, or earn a tip. The tutorial on packaging will not be repeated here, welcome to go : Three Pyinstaller packaging commands that you must remember-use Python to achieve packaging exe .

 

Eight, easter eggs

The following two codes are provided by friends in the group, and the editor has also tested it by himself. The pro-test is effective, and everyone is welcome to try it actively!

Code from group friend Jayson:

# -*- coding: utf-8 -*-
# @Author: hebe
# @Date:   2020-04-18 18:31:03
# @Last Modified by:   hebe
# @Last Modified time: 2020-04-18 19:40:48
import os 
import glob
import openpyxl

def merge_xlsx_files(xlsx_files):
    wb = openpyxl.load_workbook(xlsx_files[0])
    ws = wb.active
    ws.title = "merged result"

    for  filename in xlsx_files[1:]:
        workbook = openpyxl.load_workbook(filename)
        sheet = workbook.active
        for row in sheet.iter_rows(min_row=1):
            values = [cell.value for cell in row]
            ws.append(values)
    return wb

#path is very important here , must true.
def get_all_xlsx_files(path):
    xlsx_files = glob.glob(os.path.join(r'C:\\Users\\pdcfi\\Desktop\\', '*.xlsx'))
    sorted(xlsx_files, key=str.lower)
    return xlsx_files

def main():
    xlsx_files = get_all_xlsx_files(os.path.expanduser('~lmx'))
    wb = merge_xlsx_files(xlsx_files)
    wb.save('merged_form.xlsx')

if __name__ == '__main__':
    main()
    
print("all excel append OK!")

Code from my friend Liu Zaoqi:

# -*- coding: utf-8 -*-
from openpyxl import load_workbook, Workbook
import glob

path = "C:\\Users\\pdcfi\\Desktop\\excel\\"
new_workbook = Workbook()
new_sheet = new_workbook.active

# 用flag变量明确新表是否已经添加了表头,只要添加过一次就无须重复再添加
flag = 0

for file in glob.glob(path + '/*.xlsx'):
    workbook = load_workbook(file)
    sheet = workbook.active

    coloum_A = sheet['A']
    row_lst = []
    for cell in coloum_A:
        if cell:
            print(cell.row)
            row_lst.append(cell.row)

    if not flag:
        header = sheet[1]
        header_lst = []
        for cell in header:
            header_lst.append(cell.value)
        new_sheet.append(header_lst)
        flag = 1

    for row in row_lst:
        data_lst = []
        for cell in sheet[row]:
            data_lst.append(cell.value)
        new_sheet.append(data_lst)

new_workbook.save(path + '/' + '符合筛选条件的新表.xlsx')

Code from Qunyou Engineer:

import tkinter as tk
from tkinter import filedialog
import os
import pandas as pd
import glob

root = tk.Tk()
root.withdraw()

# 选择文件夹位置
filelocation = os.path.normpath(filedialog.askdirectory(initialdir=os.getcwd()))
lst = []

# 读取文件夹下所有文件(xls和xlsx都读取)
for i in glob.glob(filelocation + "\\\\" + "*.*"):
    if os.path.splitext(i)[1] in [".xls", ".xlsx"]:
        lst.append(pd.read_excel(i))

# 保存合并后的excel文件
writer = pd.ExcelWriter(filedialog.asksaveasfilename(title="保存", initialdir=filelocation, defaultextension="xlsx",
                                                     filetypes=[("Excel 工作簿", "*.xlsx"),
                                                                ("Excel 97-2003 工作簿", "*.xls")]))
pd.concat(lst).to_excel(writer, 'all', index=False)
writer.save()

print('\n%d个文件已经合并成功!' % len(lst))

Finally, the small partners who need the project code of this article

 

Of course, the realization of the functions of this article is not limited to the three methods mentioned above. It can also be done using pandas. If you have other methods, welcome to make friends to learn and communicate.

I still want to recommend the Python learning group I built by myself : 721195303. All students in the group are learning Python. If you want to learn or are learning Python, you are welcome to join. Everyone is a software development party and shares dry goods from time to time (only Python software development related), including a copy of the latest Python advanced materials and zero-based teaching compiled by myself in 2021. Welcome friends who are in advanced and interested in Python to join!

Guess you like

Origin blog.csdn.net/aaahtml/article/details/114160986