Kshitij Yadav :
I have a dataset which has the following values:
LabelA PositiveA NegativeA LabelB PositiveB NegativeB LabelC PositiveC NegativeC Final_Label
1 .60 .40 0 .30 .70 1 .9 .1 1
0 .1 .9 0 .49 .51 0 .3 .7 0
0 .34 .66 1 .87 .13 1 .90 .1 1
Final_label would be 1 if majority of Labels (LabelA, LabelB and LabelC) would be 1 and vice-versa.
I want to calculate a column called "Polarity" which has the following defination:
- If Final_label = 1, Polarity is the mean of all the "PositiveA/B/C" whose Label was also 1
- If Final_label = 0, Polarity is the mean of all the "NegativeA/B/C" whose label was also 0
For example in the above dataset, Polarity would have the following value:
Polarity
.75 (adding and taking average of PositiveA and PositiveC)
.7033 (adding and taking average of NegativeA and Negativeb and NegativeC)
.885 (adding and taking average of PositiveB and PositiveC)
How do I implement this in python? Over here I have shown 3 columns, in my dataset I have 7 Label columns.
Quang Hoang :
Here's my approach with where
and mask
:
# filter the labels, positives, negatives:
labels = df.filter(regex='Label\w').eq(1).values
positives = df.filter(regex='Positive\w')
negatives = df.filter(regex='Negative\w')
# output
df['Polarity'] = np.where(df['Final_Label'],
positives.where(labels).mean(axis=1),
negatives.mask(labels).mean(axis=1)
)
print(df['Polarity'])
Output:
0 0.750000
1 0.703333
2 0.885000
Name: Polarity, dtype: float64
Guess you like
Origin http://10.200.1.11:23101/article/api/json?id=390781&siteId=1