Machine Learning (a): model assessment index

Usually developed in the field of machine learning based on actual business scenarios corresponding to different business metrics for different machine learning problems such as regression, classification, sorting, its evaluation index will be different.

I. Here are some common concepts

Table 1 Common dichotomous confusion matrix

 

From this table can be drawn out some other Evaluation:
- the ACC: Classification Accuracy, described classification accuracy of the classifier
is calculated as follows: the ACC = (TP + the TN) / (TP + the FP + FN + the TN)
- the BER: Balanced error rate
is calculated as: the BER =. 1/2 * (the FPR + FN / (FN + TP))
- TPR: to true positive Rate, described in the identified positive example proportion of all the positive examples
is calculated as: TPR = TP / (TP + FN)
- the FPR: to false positive Rate, description will be negative cases identified as the case of positive examples proportion of all negative embodiment is
calculated as follows: the FPR = the FP / (the FP + the TN)
- the TNR: to true negative Rate, described identification a negative cases accounted for all the negative embodiment the ratio
is calculated as follows: the TNR = the TN / (the FP + the TN)
- the PPV: Positive Predictive value
is calculated as follows: the PPV = TP / (TP + the FP)
- the NPV: negative Predictive value
calculation formula : NPV = TN / (FN + TN)
wherein the TPR is the sensitivity (sensitivity), TNR is the specificity (specificity).
Second, the confusion matrix is calculated based on a schematic diagram

 

Third, examples of interpretation

Below medicine for diabetes screening sensitivity and specificity as an example to explain. In this example, we will only determine whether the patient's blood glucose levels as indicators suffering from diabetes. The figure below shows normal blood glucose levels in diabetic patients and charts:

 

 

We found that there are two populations overlap, this time to set different criteria, the results would be different.
If we are putting it on the far left of the dotted line, then this line is lower than normal, higher than this line contains two types of people: normal subjects and patients with diabetes. This is the time when the highest sensitivity, which is actually sick and was diagnosed with a probability of illness, did not let a sick man. If the standards are set on the dotted line the far right, is the highest specificity of time, which is actually not ill and was diagnosed as normal probability, not a wasted no sick person.
The final, high sensitivity missed = low specificity = high misdiagnosis rate.
Ideally we want to have a high sensitivity and specificity, but in fact we generally find a balance in sensitivity and specificity, this process can be represented by ROC (Receiver Operating Characteristic) curve:

 

 FIG point V34 i.e., having a high sensitivity and specificity.

 Fourth, the reference

Knowledge Source: https: //blog.csdn.net/A_a_ron/article/details/79051077

Your knowledge in the dissemination!

 

Guess you like

Origin www.cnblogs.com/xiaofeiIDO/p/11940173.html