Accuracy (accuracy), precision (Precision), recall (Recall) and comprehensive evaluation index (F1-Measure) ---- turn

 
Recall (recall rate); Precision (accuracy rate); F1-Feature (comprehensive evaluation index);
 
    These parameters are often used in information retrieval (such as search engines), natural language processing, and detection and classification. Errors are inevitable in understanding the reasons for language translation. Let me introduce my understanding of them.
 
First come a definition:
Precision: The proportion of correct or relevant (that is, what you want) information in the detected information;
Recall: The proportion of all correct or relevant information (wanted) detected.
Defined after F1-Meature.
 
After checking the information, I am used to using the four-grid diagram to explain, here is an easy-to-understand four-grid diagram:
  correct, relevant (wanted) incorrect, irrelevant
detected
true positives
false positives
not detected
false negatives (to true fn)
true negatives (remove false tn)
 
 
The translation in the table is more important and can help with understanding.
true positives false positives
false negatives true negatives
Among them, false positives are also commonly called false positives, and false negatives are also commonly called false negatives!
 
Precision =   tp/(tp + fp);
Recall = tp / (tp + fn).
There are also two other definitions
\mbox{True Negative Rate}=\frac{tn}{tn+fp} \,
\mbox{Accuracy}=\frac{tp+tn}{tp+tn+fp+fn} \,

However, in practice, we certainly hope that the higher the retrieval result P, the better, and the higher the R is, the better; in fact, the two are contradictory in some cases. For example, if we only found one result, and it is accurate, then P is 100%, but R is low; and if we return all results, then R must be 100%, but P is low. Therefore, in different occasions, you need to judge by yourself whether you want P to be higher or R to be higher. If you are doing an experiment, you can draw a Precision-Recall curve to help with analysis.

 

 

F-Measure is Precision and Recall weighted harmonic mean:
F = \frac{(a^2+1)P*R} {a^2(P+R)} \hfill (3)

When the parameter a=1, it is the most common F1:
F1 = \frac{2PR} {P+R} \hfill (4)

It is easy to understand that F1 combines the results of P and R. When F1 is higher, it means that the experimental method is ideal.

two,

In natural language processing (ML), machine learning (NLP), information retrieval (IR) and other fields, evaluation is a necessary work, and its evaluation indicators often have the following points: accuracy, precision ( Precision), Recall and F1-Measure.

This article will briefly introduce a few of these concepts. The translation of these evaluation indicators in Chinese is different, so it is generally recommended to use English.

 

Now let me assume a specific scenario as an example:

Suppose a class has 80 boys and 20 girls, a total of 100. The goal is to find all girls.
Now someone selects 50 people, 20 of them are girls, and 30 boys are also mistaken as girls. Picked out.
As an evaluator you need to evaluate his work

 

The selection results are represented by a matrix diagram: Define four classifications of TP, FN, FP, and TN

  Relevant, positive class irrelevant (NonRelevant), negative class
Retrieved Relevant documents retrieved by the TP system, such as "20 of them are girls" Irrelevant documents retrieved by the FP system, such as "mistakenly treating 30 boys as girls"
Not Retrieved Documents not retrieved by FN related systems, such as "0 people not picked are girls" Documents related to TN but not retrieved by the system, such as "50 non-girls were not selected"

 

The formula for accuracy is, which is defined as the ratio of the number of samples correctly classified by the classifier to the total number of samples for a given test data set. That is, the accuracy rate on the test data set when the loss function is 0-1 loss

A = (20+50) / 100 = 70%

 

The formula for precision is that it calculates the proportion of items that "should be retrieved" among all retrieved items.

 

P = 20 / (20+30) = 40%

 

The formula for recall is that it calculates the ratio of all retrieved items to all "should retrieved items".

R = 20 / (20 + 0) = 100%

 

The comprehensive evaluation index (F-Measure) is the weighted harmonic average of Precision and Recall:

F = \frac{(a^2+1)P*R} {a^2(P+R)} \hfill (3)

When the parameter a=1, it is the most common F1:

F1 = \frac{2PR} {P+R} \hfill (4)

The P and R indicators are sometimes contradictory, and the two measures of precision and recall are considered comprehensively. It is easy to understand that F1 combines the results of P and R. When F1 is higher, it means that the experimental method is ideal.

F1 = 2 * 0.4 * 1 / (0.4 + 1) = 57%

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325939336&siteId=291194637