参考自:
https://www.zhihu.com/question/428547855
https://www.jianshu.com/p/7919ef304b19
混淆矩阵
预测结果 | ||
---|---|---|
真实结果 | 正例 | 反例 |
正例 | TP(真正例) | FN(假反例) |
反例 | FP(假正例) | TN(真反例) |
- F 1 = 2 ∗ P ∗ R P + R F1 = \frac{2 * P * R}{P + R} F1=P+R2∗P∗R
- P:精确率:所有模型预测为正例的样本中真实为正例的概率
- P = T P T P + F P P = \frac{TP}{TP + FP} P=TP+FPTP
- R:召回率:所有正例中模型正确预测的概率
- R = T P T P + F N R = \frac{TP}{TP+FN} R=TP+FNTP
GAUC
https://blog.csdn.net/qq_42363032/article/details/120070512
修改F1
此时的F1 score对于样本不平衡imbalanced learning
问题并不太好用。所以另一种定义方法是分别定义F1 score for Positive和F1 score for Negative。前者等价于通常所说的F1 score,后者略微修改上述公式就能求出。然后再根据Positive和Negative的比例来加权求一个weighted F1 score即可。这个新的F1 score还是能大致反应模型的真实水平的。但是,如果的样本高度不均匀,weighted F1 score也会失效。
-
F1 score for Positive
- F 1 p o s i t i v e = 2 ∗ P p o s i t i v e ∗ R p o s i t i v e P p o s i t i v e + R p o s i t i v e F1_{positive} = \frac{2 * P_{positive} * R_{positive}}{P_{positive} + R_{positive}} F1positive=Ppositive+Rpositive2∗Ppositive∗Rpositive
- P p o s i t i v e = T P T P + F P P_{positive} = \frac{TP}{TP + FP} Ppositive=TP+FPTP
- R p o s i t i v e = T P T P + F N R_{positive} = \frac{TP}{TP+FN} Rpositive=TP+FNTP
-
F1 score for Negative
- F 1 n e g a t i v e = 2 ∗ P n e g a t i v e ∗ R n e g a t i v e P n e g a t i v e + R n e g a t i v e F1_{negative} = \frac{2 * P_{negative} * R_{negative}}{P_{negative} + R_{negative}} F1negative=Pnegative+Rnegative2∗Pnegative∗Rnegative
- P n e g a t i v e = F N F N + T N P_{negative} = \frac{FN}{FN + TN} Pnegative=FN+TNFN
- R n e g a t i v e = T N F P + T N R_{negative} = \frac{TN}{FP+TN} Rnegative=FP+TNTN
-
设正样本
Positive
占比为 α \alpha α,负样本Negative
占比为 β \beta β -
F 1 w e i g h t e d = α ∗ F 1 p o s i t i v e + β ∗ F 1 n e g a t i v e F1_{weighted} = \alpha * F1_{positive} + \beta * F1_{negative} F1weighted=α∗F1positive+β∗F1negative
Specificity
一个简单且在实际应用和paper中都常见的指标是specificity,它是模型对negative的召回率。它的计算很简单,为specificity = TNs / (TNs + FPs)。
specificity之所以常见有两方面原因。在实际应用中,尤其是与imbalanced learning有关的问题中,少类样本通常是更加关注的样本。因此观察模型对它的召回率通常非常重要。在paper中,在你打榜主score打不赢别人的时候,你可以另辟蹊径地告诉别人,specificity非常重要,它就成了你人生中重要的僚机,让你多了一条路来有理有据地outperforms others。
- TNR:true negative rate,描述识别出的负例占所有负例的比例
计算公式为: S p e c i f i c i t y = T N R = T N T N + F P Specificity = TNR= \frac{TN}{TN + FP} Specificity=TNR=TN+FPTN
G-Mean
G-Mean是另外一个指标,也能评价不平衡数据的模型表现,其计算公式如下。
对正样本召回率和对负样本召回率相乘再开根号
G − M e a n = R e c a l l ∗ S p e c i f i c i t y = T P T P + F N ∗ T N T N + F P G-Mean = \sqrt{Recall * Specificity} \\ = \sqrt{\frac{TP}{TP+FN} * \frac{TN}{TN+FP}} G−Mean=Recall∗Specificity=TP+FNTP∗TN+FPTN
MCC
MCC是应用在机器学习中,用以测量二分类的分类性能的指标,该指标考虑了真阳性,真阴性,假阳性和假阴性,通常认为该指标是一个比较均衡的指标,即使是在两类别的样本含量差别很大时,也可以应用它。
MCC本质上是一个描述实际分类与预测分类之间的相关系数,它的取值范围为[-1,1],取值为1时表示对受试对象的完美预测,取值为0时表示预测的结果还不如随机预测的结果,-1是指预测分类和实际分类完全不一致。
M C C = T P ∗ T N − F P ∗ F N ( T P + F P ) ( T P + F N ) ( T N + F P ) ( T N + F N ) MCC = \frac{TP * TN - FP*FN}{\sqrt{(TP+FP)(TP+FN)(TN+FP)(TN+FN)}} MCC=(TP+FP)(TP+FN)(TN+FP)(TN+FN)TP∗TN−FP∗FN
----
GAUC代码实现
# 计算GAUC
# GAUC即先计算各个用户自己的AUC,然后加权平均
# 权重设为每个用户click的次数,并且会滤掉单个用户全是正样本或全是负样本的情况
# flag=False:不进行矫正 当 is_outflow=1 时,预测为1,矫正预测为0
def calculationGAUC(df, models, flag=False):
gbdt, ohecodel, lr = models[0], models[1], models[2]
# 计算测试集gauc
sumWAUCclick, sumWclick = 0, 0
sumWAUCall, sumWall = 0, 0
for suuid, data in df.groupby('suuid'):
# 过滤单个用户全是正样本或全是负样本的情况
if len(set(list(data['y']))) == 1:
continue
# 计算权重为每个用户的点击数、每个用户样本数
wclick = data['y'].sum()
wall = len(list(data['y']))
# 对于每个用户预测并计算其AUC
x, y = np.array(data.iloc[:, 1:-1]), np.array(data.iloc[:, -1])
x_leaves = gbdt.apply(x)[:, :, 0]
x_trans = ohecodel.transform(x_leaves)
yproba = lr.predict_proba(x_trans)[:, 1] # 预测的概率
# y_pre = lr.predict(x_trans)
y_pre = []
for proba in yproba:
if proba > 0.8:
# if proba > 0.5:
y_pre.append(1)
else:
y_pre.append(0)
if not flag:
aucUser = roc_auc_score(y, y_pre)
# 当 is_outflow=1 时
else:
is_outflowdata = np.array(data.loc[:, 'is_outflow'])
for i in range(len(is_outflowdata)):
if is_outflowdata[i] == 1:
if yproba[i] > 0.8:
yproba[i] = yproba[i] - 0.1
if yproba[i] > 0.8:
y_pre[i] = 1
else:
y_pre[i] = 0
aucUser = roc_auc_score(y, y_pre)
# 分子、分母累加
sumWAUCclick = sumWAUCclick + wclick * aucUser
sumWAUCall = sumWAUCall + wall * aucUser
sumWclick += wclick
sumWall += wall
gaucclick = sumWAUCclick / sumWclick
gaucall = sumWAUCall / sumWall
return gaucclick, gaucall
修改F1代码实现
def weightF1ForPN(y, y_pre, F1_positive, alpha, beta):
lenall = len(y)
# y = y.flatten()
pre = 0
rec = 0
precisoinlen = 0
recallLen = 0
for i in range(lenall):
# 精确率_负样本:所有预测为负中,真实为负的比例
if y_pre[i] == 0:
pre += 1
if y[i] == 0:
precisoinlen += 1
# 召回率_负样本:所有负例中模型为负预测的概率
if y[i] == 0:
rec += 1
if y_pre[i] == 0:
recallLen += 1
p_negative = precisoinlen / pre
r_negative = recallLen / rec
print(' 预测为负的样本数:{},在这其中实际为负的样本数:{},负样本精确率:{}'.format(pre, precisoinlen, p_negative))
print(' 负例样本:{},负例中预测为负的数量:{},负样本召回率:{}'.format(rec, recallLen, r_negative))
F1_negative = (2 * p_negative * r_negative) / (p_negative + r_negative)
print(' 负样本F1:{}'.format(F1_negative))
f1_weight = alpha * F1_positive + beta * F1_negative
return f1_weight, p_negative, r_negative
MCC代码实现
def evalMCC(y, y_pre):
lenall = len(y)
TP, FP, FN, TN = 0, 0, 0, 0
for i in range(lenall):
if y_pre[i] == 1:
if y[i] == 1:
TP += 1
if y[i] == 0:
FP += 1
if y_pre[i] == 0:
if y[i] == 1:
FN += 1
if y[i] == 0:
TN += 1
member = TP*TN - FP*FN
demember = ((TP+FP) * (TP+FN) * (TN+FP) * (TN+FN)) ** 0.5
mcc = member / demember
return mcc