评价指标(精准率、召回率、f1、macro f1、micro f1、weighted f1)

其中行表示真实值,就和sklearn传参一样,第一个参数是真实值。

In pattern recognition, information retrieval and classification (machine learning), precision and recall are performance metrics that apply to data retrieved from a collection, corpus or sample space.

Precision (also called positive predictive value) is the fraction of relevant instances among the retrieved instances, while recall (also known as sensitivity) is the fraction of relevant instances that were retrieved. Both precision and recall are therefore based on relevance.

Consider a computer program for recognizing dogs (the relevant element) in a digital photograph. Upon processing a picture which contains ten cats and twelve dogs, the program identifies eight dogs. Of the eight elements identified as dogs, only five actually are dogs (true positives), while the other three are cats (false positives). Seven dogs were missed (false negatives), and seven cats were correctly excluded (true negatives). The program’s precision is then 5/8 (true positives / selected elements) while its recall is 5/12 (true positives / relevant elements).

When a search engine returns 30 pages, only 20 of which are relevant, while failing to return 40 additional relevant pages, its precision is 20/30 = 2/3, which tells us how valid the results are, while its recall is 20/60 = 1/3, which tells us how complete the results are.

Adopting a hypothesis-testing approach from statistics, in which, in this case, the null hypothesis is that a given item is irrelevant, i.e., not a dog, absence of type I and type II errors (i.e. perfect specificity and sensitivity of 100% each) corresponds respectively to perfect precision (no false positive) and perfect recall (no false negative).

More generally, recall is simply the complement of the type II error rate, i.e. one minus the type II error rate. Precision is related to the type I error rate, but in a slightly more complicated way, as it also depends upon the prior distribution of seeing a relevant vs an irrelevant item.

The above cat and dog example contained 8 − 5 = 3 type I errors, for a type I error rate of 3/10, and 12 − 5 = 7 type II errors, for a type II error rate of 7/12. Precision can be seen as a measure of quality, and recall as a measure of quantity. Higher precision means that an algorithm returns more relevant results than irrelevant ones, and high recall means that an algorithm returns most of the relevant results (whether or not irrelevant ones are also returned).

a c c u r a c y = T P + T N T P + F P + T N + F N accuracy=\frac{TP+TN}{TP+FP+TN+FN} accuracy=TP+FP+TN+FNTP+TN

p r e c i s i o n = T P T P + F P precision=\frac{TP}{TP+FP} precision=TP+FPTP

r e c a l l = T P T P + F N recall=\frac{TP}{TP+FN} recall=TP+FNTP

f 1 = 2 1 p r e c i s i o n + 1 r e c a l l f1=\frac{2}{\frac{1}{precision}+\frac{1}{recall}} f1=precision1+recall12

m a c r o   f 1 等 于 每 一 类 f 1 的 平 均 值 , 相 当 于 是 每 一 类 macro \, f1等于每一类f1的平均值,相当于是每一类 macrof1f1

m i c r o   f 1 是 汇 总 会 再 进 行 计 算 micro \, f1是汇总会再进行计算 microf1

w e i g h t e d f 1 是 宏 f 1 的 加 权 和 。 weighted f1是宏f1的加权和。 weightedf1f1

猜你喜欢

转载自blog.csdn.net/weixin_47532216/article/details/121069698