Pandas group by unique value above a threshold

bk201 :

Tough one.

Start with this DataFrame:

df = pd.DataFrame({
    'number':[4.4,11, 2.4,5, 12,22],
    'id': [1,1, 2,2, 3,3]
})
| number | id | 
|--------|----|
| 4.4    | 1  |
| 11     | 1  |
| 2.4    | 2  |
| 5      | 2  |
| 12     | 3  | 
| 22     | 3  |

I want to group by the id column, and add a third column called unique_above_10 and set the value to 1 if there is one and only one value in the groupby that is > 10.

So the new DataFrame should look like this:

| number | id | unique_above_10 |
|--------|----|-----------------|
| 4.4    | 1  | 0               |
| 11     | 1  | 1               |
| 2.4    | 2  | 0               |
| 5      | 2  | 0               |
| 12     | 3  | 0               |
| 22     | 3  | 0               |


jezrael :

Compare values by mask and count matched values by sum per groups by GroupBy.transform, compare by 1 and chain by & for bitwise AND by mask m:

m = df['number'].gt(10)
df['unique_above_10'] = (m.groupby(df['id']).transform('sum').eq(1) & m).astype(int)
print (df)
   number  id  unique_above_10
0     4.4   1                0
1    11.0   1                1
2     2.4   2                0
3     5.0   2                0
4    12.0   3                0
5    22.0   3                0

Details:

print (m)
0    False
1     True
2    False
3    False
4     True
5     True
Name: number, dtype: bool

print (m.groupby(df['id']).transform('sum'))
0    1.0
1    1.0
2    0.0
3    0.0
4    2.0
5    2.0
Name: number, dtype: float64

print (m.groupby(df['id']).transform('sum').eq(1))
0     True
1     True
2    False
3    False
4    False
5    False
Name: number, dtype: bool

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=387046&siteId=1