Adding and averaging a set of columns depending on the value of a secondary column in python

Kshitij Yadav :

I have a dataset which has the following values:

LabelA    PositiveA     NegativeA    LabelB    PositiveB     NegativeB    LabelC    PositiveC  NegativeC  Final_Label
  1          .60           .40         0          .30           .70         1          .9          .1         1
  0          .1            .9          0          .49           .51         0          .3          .7         0
  0          .34           .66         1          .87           .13         1          .90         .1         1

Final_label would be 1 if majority of Labels (LabelA, LabelB and LabelC) would be 1 and vice-versa.

I want to calculate a column called "Polarity" which has the following defination:

  1. If Final_label = 1, Polarity is the mean of all the "PositiveA/B/C" whose Label was also 1
  2. If Final_label = 0, Polarity is the mean of all the "NegativeA/B/C" whose label was also 0

For example in the above dataset, Polarity would have the following value:

Polarity
.75           (adding and taking average of PositiveA and PositiveC)
.7033         (adding and taking average of NegativeA and Negativeb and NegativeC)
.885          (adding and taking average of PositiveB and PositiveC)

How do I implement this in python? Over here I have shown 3 columns, in my dataset I have 7 Label columns.

Quang Hoang :

Here's my approach with where and mask:

# filter the labels, positives, negatives:
labels = df.filter(regex='Label\w').eq(1).values
positives = df.filter(regex='Positive\w')
negatives = df.filter(regex='Negative\w')

# output
df['Polarity'] = np.where(df['Final_Label'], 
                          positives.where(labels).mean(axis=1), 
                          negatives.mask(labels).mean(axis=1)
                         )

print(df['Polarity'])

Output:

0    0.750000
1    0.703333
2    0.885000
Name: Polarity, dtype: float64

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=390781&siteId=1