dropping columns in multiple excel spreedsheets

Tamarie :

Is there a way in python i can drop columns in multiple excel files? i.e. i have a folder with several xlsx files. each file has about 5 columns (date, value, latitude, longitude, region). I want to drop all columns except date and value in each excel file.

sammywemmy :

Let's say you have a folder with multiple excel files:

from pathlib import Path

folder = Path('excel_files')

xlsx_only_files = list(folder.rglob('*.xlsx'))


def process_files(xls_file):

    #stem is a method in pathlib 
    #that gets just the filename without the parent or the suffix
    filename = xls_file.stem

    #sheet = None ensure the data is read in as a dictionary
    #this sets the sheetname as the key
    #usecols allows you to read in only the relevant columns
    df = pd.read_excel(xls_file, usecols = ['date','value'] ,sheet_name = None)

    df_cleaned = [data.assign(sheetname=sheetname,
                              filename = filename)
                  for sheetname, data in df.items()
                 ]

    return df_cleaned


combo = [process_files(xlsx) for xlsx in xlsx_only_files]

final = pd.concat(combo, ignore_index = True)

Let me know how it goes

stem

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=19346&siteId=1