When using Python to process Excel in batches, it is often necessary to read data in batches. The common way is to combine glob
modules, which can read all csv in the current folder in batches and merge them into a large one DataFrame
.
df_list = []
for file in glob.glob("*.csv"):
df_list.append(pd.read_excel(file))
df = pd.concat(df_list)
But this requires the format and column names of each csv file to be read to be the same.
If you want to read each csv independently, you can use os
the module to loop through the CSV files in the current folder, and then use the Pandas read_csv
function to read each file
import os
import pandas as pd
df_list = []
for file in os.listdir():
if file.endswith(".csv"):
df_list.append(pd.read_csv(file))
Now, df_list
each element in is one DataFrame
, but this is still not perfect, and it still needs to be manually extracted from the list when calling.
Then how to automatically read all CSV data in the current folder and assign each CSV to a different variable
You can use the function in Python globals()
, which returns a dictionary that contains all the global variables of the current program. For example, we can use the following syntax to assign a value to a key in the dictionary:
globals()[key] = value
Therefore, use the following code to automatically read all CSV data in the current folder and assign each CSV to a different variable
df_list = []
for i, file in enumerate(os.listdir()):
if file.endswith(".csv"):
df_list.append(pd.read_csv(file))
for i, df in enumerate(df_list):
globals()[f'df{i+1}'] = df
Of course, a similar method can also be applied to read different sheets of Excel, for example, suppose data.xlsx
there are 10 sheets
df_list = [pd.read_excel("data.xlsx", sheet_name=i) for i in range(10)]
for i, df in enumerate(df_list):
globals()[f"df{i+1}"] = df
If you don't know how many Sheets the data has, you can also use it sheet_name=None
, and then automatically read it according to the returned dictionary
df_list = pd.read_excel("data.xlsx", sheet_name=None)
for i, (name, df) in enumerate(df_list.items()):
globals()[f"df_{name}"] = df