How to batch read Excel with Python?

When using Python to process Excel in batches, it is often necessary to read data in batches. The common way is to combine globmodules, which can read all csv in the current folder in batches and merge them into a large one DataFrame.

df_list = []
for file in glob.glob("*.csv"):
    df_list.append(pd.read_excel(file))

df = pd.concat(df_list)

But this requires the format and column names of each csv file to be read to be the same.

If you want to read each csv independently, you can use  os the module to loop through the CSV files in the current folder, and then use the Pandas  read_csv function to read each file

import os
import pandas as pd

df_list = []

for file in os.listdir():
    if file.endswith(".csv"):
        df_list.append(pd.read_csv(file))

Now, df_list each element in is one  DataFrame, but this is still not perfect, and it still needs to be manually extracted from the list when calling.

Then how to automatically read all CSV data in the current folder and assign each CSV to a different variable

You can use the function in Python globals() , which returns a dictionary that contains all the global variables of the current program. For example, we can use the following syntax to assign a value to a key in the dictionary:

globals()[key] = value

Therefore, use the following code to automatically read all CSV data in the current folder and assign each CSV to a different variable

df_list = []

for i, file in enumerate(os.listdir()):
    if file.endswith(".csv"):
        df_list.append(pd.read_csv(file))

for i, df in enumerate(df_list):
    globals()[f'df{i+1}'] = df

Of course, a similar method can also be applied to read different sheets of Excel, for example, suppose data.xlsxthere are 10 sheets

df_list = [pd.read_excel("data.xlsx", sheet_name=i) for i in range(10)]

for i, df in enumerate(df_list):
    globals()[f"df{i+1}"] = df

If you don't know how many Sheets the data has, you can also use it  sheet_name=None, and then automatically read it according to the returned dictionary

df_list = pd.read_excel("data.xlsx", sheet_name=None)

for i, (name, df) in enumerate(df_list.items()):
    globals()[f"df_{name}"] = df

Guess you like

Origin blog.csdn.net/veratata/article/details/128794653