Update dataframe rows trough loop

Simon de Fauconval :

I have a dataframe and I want to create some new columns that contain the growth of the original columns.

First, I append the new columns to the dataframe, filling them with NaN values.

Then, for every row I check if the previous row corresponds to the previous year, and if it does I want to fill the new column with the growth of the variable. Otherwise I just leave the NaN value.

Here is my code:

for index, row in df.iterrows():
   if df.loc[index,'year'] == df.loc[index - 1, 'year'] + 1 and df.loc[index,'name'] == df.loc[index - 1, 'name']:
       df.loc[index,k:] = (df.loc[index,1:k-1]/df.loc[index-1,1:k-1]) - 1

Where k is the column index of the first new "growth" column that I created.

The problem with this code is that it leaves the new columns with NaN values, without making any change. Did I do anything wrong?

Thanks

Bishwarup Bhattacharjee :
df.sort_values('year', inplace = True)
growth_cols = [<your-growth-cols>]
new_cols = [x + "_growth" for x in growth_cols]
growth_df = df[growth_cols] / df[growth_cols].shift(1)
growth_df.rename(columns = dict(zip(growth_cols, new_cols)), inplace = True)
df = pd.concat([df, growth_df], axis =1)
df['gap'] = df.year.diff()
for col in new_cols:
    df[col] = df[col] * df['gap']
    df[col].replace(0, np.nan, inplace = True)
df.drop('gap', axis = 1, inplace = True)

EDIT (based on updated question):

You would need to change the line

df['gap'] = df.year.diff()

to:

df['gap'] = df.groupby('name').diff()

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=376817&siteId=1