roc_curve operating mechanism

From https://blog.csdn.net/sun91019718/article/details/101314545, just to prevent loss.

2. Concepts of TP, TN, FP, and FN

Insert picture description here ## Three, TPR, TNR, FPR, FNR concept
1, TPR=tp/(tp+fn)
TPR: the true rate or sensitivity or recall rate or recall rate or true rate or efficacy, originally The samples of positive samples are predicted to be the total number of samples of positive samples ÷ the total number of samples that the real result is positive samples
In addition: the precision or precision rate formula is equal to tp/(tp+fp)
Accurate score calculation: (tp+tn)/ (tp+fp+fn+tn)
2. FNR=fn/(tp+fn) =1-TPR
FNR: the false negative rate, the total number of samples that were originally positive samples predicted to be negative samples ÷ the real result is The total number of positive samples.
It is equivalent to the probability of making the second type of error in the hypothesis test (β)
3. FPR=fp/(fp+tn)
FPR: the false positive rate, the total number of samples that are originally negative samples are predicted to be positive samples÷true The result is the total number of samples for negative samples.
It is equivalent to the probability of making the first type of error in the hypothesis test (α)
4, TNR=tn/(fp+tn)=1-FPR
TNR: the true negative rate or specificity, the sample that was originally a negative sample is predicted to be negative The total sample size of the sample ÷ the total sample number of the true result is a negative sample.

3. Simple analysis of roc_curve operating mechanism

3.1. Brief introduction of roc_curve

3.1.1 Important parameters

y_true: real result data, the data type is an array
y_score: predicted result data, which can be label data or probability value, and the data type is an array with the same shape as y_true pos_label
: the default is None, only when the label data is {0,1 }, {-1, 1} binary classification data can be used by default; otherwise, a positive sample value needs to be set

3.1.2 Returned results

Return three array results are fpr (false positive rate), tpr (recall rate), threshold (threshold)

3.2. The first case: y_score is the label data

3.2.1. Examples

code.

//python 代码
y_true=np.array([0, 0, 0, 1, 1, 0, 0, 0, 1, 0])
y_score=np.array([0, 0, 0, 1, 1, 0, 0, 0, 0, 0])
fpr,tpr,threshold=roc_curve(y_true,y_score)

returns the result.

threshold:array([2, 1, 0])
tpr:array([0.        , 0.66666667, 1.        ])
fpr:array([0., 0., 1.])

3.2.2. Explanation:

1. The result returned by threshold is the data formed by deduplicating elements in y_score and adding a 'maximum value+1' value in descending order. Each element is used as a threshold, and the data type is a one-dimensional array. For example: y_score=np.array([0, 1, 2,0,3,1]) corresponds to threshold=np.array([4, 3, 2,1,0])
2. When index=0, the threshold Equal to threshold[0]=2. At this time, it is assumed that all the elements in y_score greater than or equal to 2 corresponding to the samples whose index is in y_true are positive samples, and the others are negative samples, and then compared with the corresponding elements of y_true to form a confusion matrix, because there is no value greater than or equal to 2, so TP and FP Both are 0, that is, tpr[0]=0/3=0.0, fpr[0]=0/7=0.0
3. When index=1, the threshold is equal to threshold[1]=1. At this time, it is assumed that all the elements in y_score greater than or equal to 1 corresponding to the samples whose index is in y_true are positive samples, and the others are negative samples, and then compared with the corresponding elements of y_true to form a confusion matrix, because there are 2 numbers greater than or equal to 1, so TP =2 and FP=0, that is, tpr[1]=2/3=0.66666667, fpr[1]=0/7=0.0
4. When index=2, the threshold is equal to threshold[2]=0. At this time, it is assumed that all elements in y_score greater than or equal to 0 correspond to samples whose index is in y_true as positive samples, and others are negative samples, and then compared with the corresponding elements of y_true to form a confusion matrix, because there are 10 numbers greater than or equal to 0, so TP =3 and FP=7, ie tpr[2]=3/3=1.0, fpr[2]=7/7=1.0
So, the final result: tpr=array([0., 0.66666667, 1.]), fpr =array([0., 0., 1.])

3.3. The second case: y_score is a probability value

3.3.1. Example

code.

//python 代码
y_true=np.array([0,0,1,1])
y_score=np.array([0.1,0.4,0.35,0.8])
fpr,tpr,threshold=roc_curve(y_true,y_score)

returns the result.

threshold:array([1.8 , 0.8 , 0.4 , 0.35, 0.1])
tpr:array([0. , 0.5, 0.5, 1. , 1.])
fpr:array([0. , 0. , 0.5, 0.5, 1. ])

3.3.2. Explanation:

1. When index=0, the threshold is equal to threshold[0]=1.8. At this time, it is assumed that all the elements in y_score greater than or equal to 1.8 correspond to the samples whose index is in y_true as positive samples, and the others are negative samples, and then compared with the corresponding elements of y_true to form a confusion matrix, because there is no value greater than or equal to 1.8, so TP and FP Both are 0, that is, tpr[0]=0/2=0.0, fpr[0]=0/2=0.0
2. When index=1, the threshold is equal to threshold[1]=0.8. At this time, it is assumed that all elements in y_score greater than or equal to 1 corresponding to the samples whose index is in y_true are positive samples, and the others are negative samples, and then compared with the corresponding elements of y_true to form a confusion matrix, because there is 1 number greater than or equal to 0.8, which happens to be y_true The value of the element at this position is 1, so TP=1 and FP=0, that is, tpr[1]=1/2=0.5, fpr[1]=0/2=0.0 3. When index=2, the threshold is equal to threshold
[ 2]=0.4. At this time, it is assumed that all elements in y_score greater than or equal to 0.4 correspond to samples whose index is in y_true as positive samples, and others are negative samples, and then compared with the corresponding elements of y_true to form a confusion matrix, because there are 2 numbers greater than or equal to 0.4, so TP =1 and FP=1, that is, tpr[2]=1/2=0.5, fpr[2]=1/2=0.5
4. When index=3, the threshold is equal to threshold[3]=0.35. At this time, it is assumed that all the elements in y_score greater than or equal to 0.35 correspond to the samples whose index is in y_true as positive samples, and the others are negative samples, and then compared with the corresponding elements of y_true to form a confusion matrix, because there are 3 numbers greater than or equal to 0.35, so TP =2 and FP=1, ie tpr[3]=2/2=1.0, fpr[3]=1/2=0.5
5. When index=4, the threshold is equal to threshold[4]=0.1. At this time, it is assumed that all the elements in y_score greater than or equal to 0.1 correspond to the samples whose index is in y_true as positive samples, and the others are negative samples, and then compared with the corresponding elements of y_true to form a confusion matrix, because there are 4 numbers greater than or equal to 0.1, so TP =2 and FP=2, ie tpr[4]=2/2=1.0, fpr[4]=2/2=1.0
So, the final result: tpr=array([0. , 0.5, 0.5, 1. , 1 .]), fpr=array([0. , 0. , 0.5, 0.5, 1. ])

Guess you like

Origin blog.csdn.net/weixin_39379635/article/details/126279790