Deep learning theory - Python import library and function, two evaluation indicators (Top-5 error rate and Top-1 error rate), ROC curve and AUC concept

Tip: After the article is written, the table of contents can be automatically generated. How to generate it can refer to the help document on the right


1. About importing libraries and importing functions

import module: import a module, note: it is equivalent to importing a folder, every time you use the function in the module, you must determine the reference

from...import XX: Import is a function in a module; Note: It is equivalent to importing a file in a folder, which is an absolute path

from…import * : import all the functions in a module, Note: It is equivalent to importing all files in a folder, all functions are absolute paths

2. Two evaluation indicators (Top-5 error rate and Top-1 error rate)

1.Top-5 error rate
For a picture, if the correct answer is included in the top five probability, it is considered correct.

2. Top-1 error rate
For a picture, if the answer with the highest probability is the correct answer, it is considered correct.
ImageNet hosts an annual software competition called (ImageNet Large Scale Visual Recognition Challenge, ILSVRC). The main content is to achieve correct classification and detection and recognition of objects and scenes through algorithm programs. The evaluation standard is the Top-5 error rate.

Three, confusion matrix confusion matrix, ROC curve, AUC

1. confusion matrix confusion matrix

The confusion matrix is ​​to separately count the number of observations of the classification model classified into the wrong class and the correct class, and then display the results in a table.

The real value is positive, and the model thinks it is the number of positive (True Positive=TP)

The true value is positive, and the model considers it to be negative (False Negative=FN): This is the second type of error in statistics (Type II Error)

The true value is negative, and the number that the model considers positive (False Positive=FP): This is the first type of error in statistics (Type I Error)

The real value is negative, the number that the model considers negative (True Negative=TN)

insert image description here

2. ROC curve

The Receiver Operating Characteristic Curve (ROC) is a coordinate diagram with the false positive rate (False positive rate) as the horizontal axis and the true positive rate (True positive rate) as the vertical axis.
True Positive Rate (TPR): Indicates the proportion of real positive samples to all positive samples in the data currently predicted as positive samples

          TPR = TP / (TP + FN)

False Positive Rate (FPR): Indicates the proportion of real negative samples to all negative samples in the data currently classified as positive samples.

          FPR = FP / (TN + FP)

According to the prediction results of the model, the samples are sorted, and the samples are predicted one by one as positive samples in order. TPR and FPT are calculated each time, and the ROC curve is obtained by plotting FPR as the abscissa and TPR as the ordinate respectively. The schematic diagram of the ROC curve is as follows:
insert image description here

3.AUC

AUC (Area under Curve): The area under the ROC curve, between 0.1 and 1, can be used as a numerical value to intuitively evaluate the quality of the classifier, and the larger the value, the better.
As the evaluation standard, AUC value is defined as the area under the ROC curve, and the value range is generally between 0.5 and 1. The reason why the AUC value is used as the evaluation criterion is that in many cases, the ROC curve does not clearly indicate which classifier is better, and as a value, the classifier with a larger AUC is better.
Common calculation methods for AUC are:
(1) Trapezoidal rule: In the early days, due to limited test samples, the AUC curve we obtained was in the shape of a ladder. Each point on the curve draws a vertical line to the X-axis to obtain several trapezoids, and the sum of the areas of these trapezoids is AUC; (2
) Mann-Whitney statistics: statistics of positive and negative sample pairs, how many positive samples in the group The probability of is greater than the probability of negative samples. This estimate gradually approaches the true value as the sample size increases.

AUC = 1, which is a perfect classifier. When using this prediction model, there is at least one threshold that can give a perfect prediction. In most cases of prediction, there is no perfect classifier.
0.5 < AUC < 1, better than random guessing. This classifier (model) can have predictive value if the threshold is properly set. AUC = 0.5, the same as random guessing (eg: throwing a coin), the model has no predictive value.
AUC < 0.5, worse than random guessing; but better than random guessing as long as you always go against the prediction.
Centered image:insert image description here

Summarize

Some fragmentary knowledge points of deep learning in this article have been partially sorted out, and will be continuously improved and supplemented in subsequent studies.

Guess you like

Origin blog.csdn.net/qq_45296693/article/details/130475043