Machine learning combat (6) 01-model evaluation criteria

Foreword

Most of this article comes from the following two blogs: 
http://blog.csdn.net/dinosoft/article/details/43114935  
http://my.oschina.net/liangtee/blog/340317

Primer

Assuming the following two classifiers, which is better? (There are 90 A-type samples and 10 B-type samples.)

Sample A Sample B Classification accuracy
Classifier C1 A*90(100%) A*10(0%) 90%
Classifier C2 A*70 + B*20 (78%) A*5 + B*5 (50%) 75%

Classifier C1 classifies all test samples into class A, classifier C2 classifies 90 samples from class A into 70, and 10 samples from class B into 5 pairs.

The classification accuracy of C1 is 90%, and the classification accuracy of C2 is 75%, but intuitively, we feel that C2 is more useful. But according to the accuracy rate, then the effect of C1 is definitely better. Then this is inconsistent with what we think. That is to say, sometimes, it is not appropriate to rely solely on correctness.

We also need an evaluation indicator that can objectively reflect the ability to comprehensively predict positive and negative samples, and also consider eliminating the impact of sample tilt (in fact, it is the idea of ​​normalization and the like, which is very important in practice, such as pv is always far Far greater than click), this is the problem that the auc indicator can solve.

 

The evaluation index commonly used for classifiers in machine learning practice is auc. I do n’t want to understand it. If it ’s simple to use, just remember a sentence

The value range of auc is [0.5,1]. The larger the value, the better. If the value is less than 0.5, invert the result.

What is auc

The area under the roc curve is auc, so we must first figure out the roc.

First look at the two-category problem, there are 4 combinations of predicted and actual values, see the contingency table below

Therefore, changing the evaluation criteria to the one near the upper left corner is a good classifier. (Considering the comprehensive classification ability of positive and negative samples)

If a classifier can output score, adjust the threshold of the classifier, draw the corresponding points on the graph, and connect this line to a line. This line is roc, and the area under the curve is auc (Area under the Curve of ROC)

How to draw ROC curve

 

For a specific classifier and test data set, obviously only one classification result, namely a set of FPR and TPR results, and to get a curve, we actually need a series of FPR and TPR values ​​to get such a curve, How did this come about?

You can dynamically adjust whether a sample belongs to a positive or negative sample through an important function of the classifier, " probability output ", which means that the classifier thinks the probability of a sample belongs to a positive sample (or negative sample). When there is a column indicating the probability of being determined as a positive sample?)

If we have obtained the probability output of all samples (the probability of being a positive sample), the question now is how to change this threshold (probability output)? We sort the probability values ​​of each test sample from positive samples from large to small. The following figure is an example. There are 20 test samples in the figure. The "Class" column indicates the true label of each test sample (p indicates a positive sample, n indicates a negative sample), and "Score" indicates that each test sample belongs to a positive sample. The probability.

ROC drawing

Next, we use the "Score" value as the threshold from high to low. When the probability that the test sample belongs to a positive sample is greater than or equal to this threshold, we consider it to be a positive sample, otherwise it is a negative sample. For example, for the fourth sample in the figure, the "Score" value is 0.6, then samples 1, 2, 3, and 4 are considered positive samples, because their "Score" values ​​are all greater than or equal to 0.6, while others The samples are considered as negative samples. Each time a different threshold is selected, we can get a set of FPR and TPR, which is a point on the ROC curve. In this way, we obtained a total of 20 sets of FPR and TPR values. The results of drawing them on the ROC curve are as follows: 

ROC drawing

When we set the threshold to 1 and 0, we can get two points (0,0) and (1,1) on the ROC curve. Connect these (FPR, TPR) pairs to get the ROC curve. When the threshold value is more, the ROC curve is smoother.

--I thought ROC was useless when playing Ali! ! ! ! It's really eye-catching! ! ! Still wondering: how to determine the better result based on ROC? See which classifier is closer to the upper left corner. At the same time, the ROC can be used to determine where the selection of the probability boundary of the positive sample is more appropriate! ! ! This turned out to be the case! ! ! ! ! ! ! ! !

Calculation of AUC value

AUC (Area Under Curve) is defined as the area under the ROC curve. Obviously, the value of this area will not be greater than 1. And because the ROC curve is generally above the straight line y = x, the value range of AUC is between 0.5 and 1. The AUC value is used as the evaluation criterion because many times the ROC curve does not clearly indicate which classifier has better effect, and as a value, the classifier corresponding to a larger AUC has better effect.

What does AUC mean

So what does the AUC value mean? According to (Fawcett, 2006), the value of AUC means:> The AUC value is equivalent to the probability that a randomly chosen positive example is ranked higher than a randomly chosen negative example.

This sentence is a bit round, I try to explain: First, the AUC value is a probability value. When you randomly select a positive sample and a negative sample, the current classification algorithm ranks this positive sample in front of the negative sample according to the calculated Score The probability is the AUC value. Of course, the larger the AUC value, the more likely the current classification algorithm will rank positive samples in front of negative samples, that is, better classification.

Why use ROC curve

Now that there are so many evaluation criteria, why use ROC and AUC? Because the ROC curve has a very good characteristic: when the distribution of positive and negative samples in the test set changes, the ROC curve can remain unchanged. Class imbalances often occur in actual data sets, that is, there are many more negative samples than positive samples (or vice versa), and the distribution of positive and negative samples in the test data may also change over time. The following figure is a comparison between ROC curve and Precision-Recall curve 5 :

In the figure above, (a) and (c) are ROC curves, and (b) and (d) are Precision-Recall curves. (a) and (b) show the results of classification in the original test set (positive and negative sample distribution balance), (c) and (d) is to increase the number of negative samples in the test set to 10 times the original, classification The result of the device. It can be clearly seen that the ROC curve basically keeps its original appearance, while the Precision-Recall curve changes greatly.

 

Published 110 original articles · 22 praises · 70,000 views

Guess you like

Origin blog.csdn.net/yangshaojun1992/article/details/104360924