Pandas: Can I filter a dataframe to get only rows with a 50% difference between each other?

Rishab Nagaraj :

The following is a sample dataframe. My actual dataset has 30k rows.

df = pd.DataFrame({'Account': [30, 30, 30, 30, 30, 30, 30, 40, 40, 40],  
                   'Start': [2, 2, 2, 2, 2, 3, 3, 1, 1, 1],  
                   'Amount' : [500, 600, 800, 200, 700, 10, 800, 10, 50, 70]})

   Account Start  Amount
0       30     2     500
1       30     2     600
2       30     2     800
3       30     2     200
4       30     2     700
5       30     3      10
6       30     3     800
7       40     1      10
8       40     1      50
9       40     1      70

I want to find all rows (grouped by Account and Start) where Amount in row 1 differs from Amount in row 2 by ± 50%. I am expecting the result to look like this.

   Account Start  Amount
0       30     2     500
1                    600
2                    800
8       40     1      50
9                     70

Row 3 is excluded as 200 in row 3 is less than 50% of the amount in row 2 as well as the amount in row 3.
Row 4 is excluded as it is the last element in start = 2 and the previous row is also excluded.
Similarly, Row 5 and 6 are excluded.
Row 7 is excluded as 10 is less than 50% of the amount in row 8.

PS: In the final dataset, each group of Account and Start should have at least 4 rows.
Is there a way to do this efficiently?

ALollz :

We use pct_change, checking if it's between -50% and 50%. Because you want pairs of rows we need to check this mask or the shifted mask (shifting in the opposite direction in which we calculated the pct_change). We'll apply this function to each group separately.

def keep_within_pct(gp, shift=1, pcts=(-0.5, 0.5)):
    m = gp['Amount'].pct_change(-shift).between(*pcts)
    return gp[m | m.shift(shift).fillna(False)]

df.groupby(['Account', 'Start'], group_keys=False).apply(keep_within_pct)

   Account  Start  Amount
0       30      2     500
1       30      2     600
2       30      2     800
8       40      1      50
9       40      1      70

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=401583&siteId=1