From .csv, read only or split into sections separated by "<string>"

JG89 :

I have a .csv file that is split in sections, each starting with < string > on a row of its own as in this example. This is followed by a set of columns and their respective rows of values. Columns are not consistent between sections.

< section1 ><br>
col1 col2 col3<br>
val1 val2 val3

< section2 ><br>
col3 col4 col5<br>
val4 val5 val6<br>
val7 val8 val9

...etc. Is there a way in which I can, either when the file's in .txt or .csv, import each section either: 1) into seperate dataframes? 2) into the same dataframe, but something like df[section][col]?

Many thanks!

sammywemmy :

Depending on the size of your csv, you could read in the entire file into Pandas and split the dataframe into multiple dataframes via a list comprehension.

data = '''<Network>;;;;;;;;;;;;;;;;;;;;;
            Property;Value;;;;;;;;;;;;;;;;;;;;
            Title;;;;;;;;;;;;;;;;;;;;;
            Version;6.4;;;;;;;;;;;;;;;;;;;;
            ;;;;;;;;;;;;;;;;;;;;;
            <Sites>;;;;;;;;;;;;;;;;;;;;;
            Name;LocationCode;Longitude;Latitude;;;;;;;;;;...'''

df = pd.read_csv(StringIO(data), header=None)

create a list of dataframe names (the headers of each df)

df_names = df[0].str.extract(r'(<[a-zA-Z]+>)')[0].str.strip('<>').dropna().tolist()

find the indices for the headers regions = df.loc[df[0].str.contains(r'<[a-zA-Z]+')].index.tolist()

last_row = df.index[-1]

regions.append(last_row)

from more_itertools import windowed

create windows for each 'sub' dataframe

regions_window = list(windowed(regions,2))

the function helps with some cleanup during the dataframe extraction

def some_cleanup(df):
    df.columns = df.iloc[0].str.extract(r'(<[a-zA-z]+>)')[0].str.strip('<>')
    df = df.iloc[1:]
    return df

extract the dataframes

M = [df.loc[start:end].pipe(some_cleanup) for start,end in regions_window]

create a dict with the keys as the dataframe names

dataframe_dict = dict(zip(df_names,M))

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=27800&siteId=1