Getting the means and sum of columns of a dataframe on the basis of randomly selected bins

Temp_coder :

I have a dataframe like below.

data

Index   ID  AA  BB  CC  BIN
0       Z1  10  11  12  1
1       Z1  0   12  13  1
2       Z1  20  13  14  2
3       Z1  34  14  15  3
4       Z1  54  52  16  3
5       Z1  67  53  17  3
6       Z7  45  54  18  1
7       Z7  34  55  19  2
8       Z7  45  56  57  2
9       Z7  45  56  58  3
10      Z7  67  67  59  3

I want to get a dataframe that looks like below

data2

ID   AA_SUM_12  AA_MEAN_12  BB_SUM_12  BB_MEAN_12  CC_SUM_12  CC_MEAN_12
Z1   30         10          36         12          39         13
Z7   124        41.33       165        55          94         31.33

Where SUM_12 gives a sum where 'BIN' = 1 and 2, the concept is the same for MEAN_12.

In the real dataset, there are above 3000 different IDs, and 'BIN' ranges from 1 to 5.

I want to pick up 'BIN' randomly like taking mean where 'BIN' is 1, 3, 5 or taking sum where 'BIN' is 4, 5 and so on in a form of dataframe.

How to do that?

jezrael :

I understand question need random unique BINs with length 2 or 3:

print (df)
    ID  AA  BB  CC  BIN
0   Z1  10  11  12    1
1   Z1   0  12  13    1
2   Z1  20  13  14    2
3   Z1  34  14  15    4
4   Z1  54  52  16    5
5   Z1  67  53  17    3
6   Z7  45  54  18    4
7   Z7  34  55  19    2
8   Z7  45  56  57    4
9   Z7  45  56  58    3
10  Z7  67  67  59    3

So first get all unique values:

v = df['BIN'].unique()
print (v)
[1 2 4 5 3]

And pass to numpy.random.choice with generated random length 2 or 3:

r = np.random.choice(v, size=np.random.choice([2,3]))
print (r)
[3 5 1]

new = ''.join((str(x) for x in r))

Then filter rows by Series.isin and boolean indexing and aggregate sum with means, last add to columns names generated BINS converted to strings with join:

df1 = df[df['BIN'].isin(r)].groupby('ID')[ 'AA', 'BB', 'CC'].agg(['mean','sum'])
df1.columns = df1.columns.map(lambda x: f'{x[0]}_{x[1]}_{new}')
print (df1)
    AA_mean_351  AA_sum_351  BB_mean_351  BB_sum_351  CC_mean_351  CC_sum_351
ID                                                                           
Z1        32.75         131         32.0         128         14.5          58
Z7        56.00         112         61.5         123         58.5         117

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=25646&siteId=1