Accuracy, Precision, and Recall in Information Retrieval

Accuracy, Precision, Recalledit

Accuracy, Precision and Recall [2]    are several important concepts and indicators in the design of information retrieval, artificial intelligence, and search engines. The translation of these evaluation indicators in Chinese is different, so it is generally recommended to use English.

Concept introduction

Let's assume a specific scenario as an example.
Suppose there are 80 boys and 20 girls in a class , a total of 100. The goal is to find all girls.
  Someone selects 50 people, 20 of them are girls, and 30 boys are also selected as girls by mistake. out.
  As an evaluator, you need to evaluate his work
First we can calculate accuracy, which is defined as the ratio of the number of samples correctly classified by the classifier to the total number of samples for a given test data set. That is, the loss function is the accuracy on the test data set when the loss function is 0-1 [3]    .
This sounds a bit abstract. Simply put, in the previous scene, the actual situation is that there are two types of men and women in the class. Divided into male and female. What the accuracy needs to get is the proportion of the total number of people who have the correct score . It's easy, we can get: He correctly identified 70 of them (20 women + 50 men), and the total number is 100, so its accuracy is 70 % (70 / 100).
From the accuracy rate, we can indeed get a sense of whether a classifier is effective in some occasions, but it is not always effective to evaluate the work of a classifier. For example, google crawls 100 pages of argcv, and there are 10,000,000 pages in its index, randomly select a page, classify it, is this the page of argcv? If I judge my work by accuracy, then I will All pages are judged as "not argcv pages", because my efficiency is very high (return false, one sentence), and the accuracy has reached 99.999% (9,999,900/10,000,000), and many other classifiers have worked hard to calculate The value of , and my algorithm is obviously not expected, so how to solve it? This is the time when precision, recall and f1-measure come into play.
Before talking about precision, recall and f1-measure, we need to define four classifications of TP, FN, FP, and TN.
According to the previous example, we need to find all girls If we regard this task as a classifier, then girls are what we need, but boys are not, so we call girls "normal" and boys for the "negative class".
Relevant, positive class irrelevant (NonRelevant), negative class
Retrieved true positives(TP 正类判定为正类,例子中就是正确的判定"这位是女生") false positives(FP 负类判定为正类,"存伪",例子中就是分明是男生却判断为女生,当下伪娘横行,这个错常有人犯)
未被检索到(Not Retrieved) false negatives(FN 正类判定为负类,"去真",例子中就是,分明是女生,这哥们却判断为男生--梁山伯同学犯的错就是这个) true negatives(TN 负类判定为负类,也就是一个男生被判断为男生,像我这样的纯爷们一准儿就会在此处)
通过这张表,我们可以很容易得到例子中这几个分类的值:TP=20,FP=30,FN=0,TN=50.
精确率(precision)的公式是
   
,它计算的是所有被检索到的item(TP+FP)中,"应该被检索到的item(TP)”占的比例。
在例子中就是希望知道此君得到的所有人中,正确的人(也就是女生)占有的比例.所以其precision也就是40%(20女生/(20女生+30误判为女生的男生)).
召回率(recall)的公式是
   
,它计算的是所有检索到的item(TP)占所有"应该被检索到的item(TP+FN)"的比例。
在例子中就是希望知道此君得到的女生占本班中所有女生的比例,所以其recall也就是100%(20女生/(20女生+ 0 误判为男生的女生))
前文中提到F1-measure的计算公式是
   
其推导其实也很简单。
定义:
 
   
定义为
   
   
调和平均数
可得:

“精确率”与“召回率”的关系

“精确率”与“召回率”虽然没有必然的关系(从上面公式中可以看到),然而在大规模数据集合中,这两个指标却是相互制约的。 [4]  
由于“检索策略”并不完美,希望更多相关的文档被检索到时,放宽“检索策略”时,往往也会伴随出现一些不相关的结果,从而使准确率受到影响。
而希望去除检索结果中的不相关文档时,务必要将“检索策略”定的更加严格,这样也会使有一些相关的文档不再能被检索到,从而使召回率受到影响。
凡是涉及到大规模数据集合的检索和选取,都涉及到“召回率”和“精确率”这两个指标。而由于两个指标相互制约,我们通常也会根据需要为“检索策略”选择一个合适的度,不能太严格也不能太松,寻求在召回率和精确率中间的一个平衡点。这个平衡点由具体需求决定

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325587630&siteId=291194637