[] Machine learning classification index summary

Reprinted Source: https://blog.csdn.net/wf592523813/article/details/95202448

 

table of Contents

1, the second Classified Index

1.1 accuracy (Accuracy)

1.2 precise ratio (Precision)

1.3 recall (recall Recall)

1.4 F1-Score

1.5 ROC curve and AUC

1.6 ROC and P, R comparison

2, multi Classified Index

More than 2.1 classification into a 2vs2 to evaluate the problem

And more directly defined classification index 2.2

2.2.1Kappa coefficient

2.2.2. Hamming distance

2.2.3. Jaccard similarity coefficient

2.2.4. Hinge loss


 

1, the second Classified Index

 

1.1 accuracy (Accuracy)

A = TP + TN / (TP + FP + TN + FN)
Performance Evaluation of classification is generally classification accuracy, i.e., the data for a given proportion of the total number of samples correctly classified sample number.
Note: The accuracy of this indicator in Unbalanced performance on poor data set, as if our very different number of positive and negative samples, such as 100 positive samples, negative samples 9900, then direct all the samples were negative forecast , accuracy was 99%, but this classification model actual performance is very poor, because it is all positive samples were divided wrong.

 

1.2 precise ratio (Precision)

P = TP/(TP+FP)

Accuracy rate refers to the proportion of samples positive forecast for the class of share of real class, intuitive interpretation accuracy rate that is up groups suspected thief to catch a thief how much?


1.3 recall (recall Recall)

R = TP / (TP + FN )
recall means all positive class are predicted to be positive class proportion Intuitively recall rate represents I predicted as positive values, representing all positive what percentage of the sample it? Suspected thief caught groups accounted for much real thief groups?

Different classification requirements for precision and recall are also different. E.g

Counterfeit currency forecast, we need a very high accuracy rate, I need you to give my prediction data with high accuracy.
Tumor forecast would require a high recall rate. "Rather victimizes three thousand, can not miss one."

 

1.4 F1-Score

2 / = Fl. 1 / P +. 1 / R & lt
Fl * = R & lt 2P / (P + R & lt)
Fl-Score: Harmonic average of precision and recall.
Recall and Precision because the amount of a conflicting, when the high P, R tends to be relatively low, when the high R, P tends to be relatively low, so in order to better evaluate the performance of the classifier, typically using F1-Score as the evaluation criteria to measure the overall performance of the classifier.

 

1.5 ROC curve and AUC


TPR: True Positive Rate, the real rate, TPR representatives can divide positive examples of probability

= TP TPR / TP + FN

FPR: False Positive Rate, the false positive rate, FPR representatives negative patients error probability divided into positive examples

FPR = FP / FP + TN

use as abscissa FPR, TPR ROC curve is obtained as follows ordinate

ROC curve and the four points in a line

(0,1): FN = 0, FP = 0, indicates that all samples are correctly classified, this is a perfect classifier;

(1,0): TN = 0, TP = 0, indicates that all samples are misclassified, this is a worst classifier;

(0, 0): FP = 0, TP = 0, indicates that all samples are classified as negative

(1,1): TN = 0, FN = 0, represents about positive samples were classified as

After the above analysis, ROC curve closer to the upper left corner, the better the performance of the classifier.
And the dashed line y = x, the diagonal line actually represents a result of random guessing classifier.

ROC curve painting: in binary classification problem, we finally get the data for each sample is estimated that the probability of a positive value (Score), according to each of our sample size is a positive probability descending order, and then follow high to low probability, once the "Score" value as a threshold value threshold, when the probability of the test sample is greater than or equal to this threshold, we think it is a positive sample, otherwise negative samples. Every time selecting a different threshold, you can get a set of FPR and TPR, i.e. point on the ROC curve.

AUC: area under the reason (Area Under roc CurveROC) curve, AUC is the introduction of quantitative evaluation index.
The larger the AUC of the area, the better classification results. AUC is less than 1, on the other hand, normal classifier you should be okay than random guessing effect? Therefore, 0.5 <= AUC <= 1

AUC characterizes the classifier to the positive samples came in front of the capacity of the negative samples. After meaning here actually refers to data in accordance with its positive probability descending order, the positive samples came in front of the capacity of the negative samples. The larger the AUC, the more positive samples came in front of the negative samples. Extreme point of view, if the ROC (0, 1) point, all positive samples were in the front row of negative samples.

 

1.6 ROC and P, R comparison

* ROC curve characteristics: when the change in the distribution of positive and negative samples in the test set, ROC curve can be kept constant. ** Data on the actual concentration often appears unbalanced class (class imbalance) phenomenon, namely the negative samples and more than a lot of positive samples (or vice versa), and the distribution of positive and negative samples test data may change over time. )


The figure is the comparison of ROC curve and the curve Precision-Recall:

a, c ROC curve is a, b, d is the PR curve;
A, B is the result on the original test set (Balanced) a, c, d result of increasing the negative samples 10 times the original data set. Clearly, ROC curve remained unchanged, PR curve is large.

Why did you choose AUC better?

Because a binary classification problem, if you take the P or R, then great select a relationship your findings and your threshold, but my a classifier set, and I hope that evaluation is and you get nothing to the threshold, but also is the need to do processing independent of the threshold. So AUC than good PR

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
acc = accuracy_score(y_true, y_pred)

 

 

2, multi Classified Index


A method for multi-classification problems is converted to a plurality 2vs2 discussed problems, complicated steps. There is also a multi-disaggregated indicators defined directly.

 

More than 2.1 classification into a 2vs2 to evaluate the problem


Accuracy: the same binary, predict the proportion of correct sample of the total sample.
Accuracy rate: 'macro', for each label, calculates Precision, then taking a weighted average of no
recall: 'macro', for each label, calculates the Recall, and then taking a weighted average is not
F1-Score: 'macro' for each label, calculates hair, and then taking the weighted average no
'micro', the TP n binary classifiers evaluation, FP, FN corresponding to the sum calculated P and R, then obtain F1
general micro and macro-f1 good -f1 are high performance classifier

 

And more directly defined classification index 2.2


2.2.1Kappa coefficient


kappa coefficient is used in a method for the statistical evaluation of the consistency, in the range [-1,1], the practical application, it is common [0,1], the ROC curve is generally not the male similar principles curve. The higher the value of this coefficient, the higher the accuracy of the model, a classification represents.


P0 represents the overall classification accuracy
Pe represents the SUM (class i * real number of samples in class i predicted the number of samples) / total number of samples square

from sklearn.metrics import cohen_kappa_score
kappa = cohen_kappa_score(y_true,y_pred,label=None) #(label除非是你想计算其中的分类子集的kappa系数,否则不需要设置)

 

2.2.2. Hamming distance


Hamming distance also applies to multi-classification problem, it is simply a measure of the distance between the predicted and actual tag labels, values ​​between 0 and 1. From 0 Description identical to the real results predicted results, it shows a model with the results we want is entirely contrary to the distance.

from sklearn.metrics import hamming_loss
ham_distance = hamming_loss(y_true,y_pred)


2.2.3. Jaccard similarity coefficient


It differs from the denominator of the Hamming distance. When exactly matches the predicted and the actual result, the coefficient is 1; and when the prediction and the actual result is completely incompatible with coefficients 0; when the prediction result is a proper subset of actual or true superset, the distance between 0 and 1 of between.
We can get the overall performance of the algorithm in the case of the test set by averaging the prediction of all samples.

from sklearn.metrics import jaccard_similarity_score
jaccrd_score = jaccrd_similarity_score(y_true,y_pred,normalize = default)

#normalize默认为true,这是计算的是多个类别的相似系数的平均值,normalize = false时分别计算各个类别的相似系数




2.2.4. Hinge loss


Hinge loss (Hinge loss) is generally used to "maximized edge" (maximal margin). Loss value between 0 and 1, when the value of 0, indicates multiple classification model classification entirely accurate value of 1 indicates totally ineffective.

from sklearn.metrics import hinge_loss
hinger = hinger_loss(y_true,y_pred)

 

Published 44 original articles · won praise 16 · views 10000 +

Guess you like

Origin blog.csdn.net/YYIverson/article/details/103368927