Class imbalance (sample imbalance data) processing notes + ROC/AUC curve

insert image description here

When dividing the training set and the test set

insert image description here

insert image description here
The parameter stratif in the function train_test_split
insert image description here

ROC curve

Why propose the ROC curve

In different application tasks, we can use different thresholds according to the task requirements.

For example, if we pay more attention to the "recall rate", we can set a larger threshold value to make the prediction results of the classifier more confident; if we pay more attention to the "recall rate", we can set the threshold value to be smaller, Let the classifier predict more positive examples.

Therefore, the quality of the threshold setting reflects the generalization performance of the learner under different tasks. In order to describe this change vividly, the ROC curve is introduced here. The ROC curve is a powerful tool to study the generalization performance of the learner from the perspective of threshold selection.

What is ROC curve

The vertical axis of the ROC curve is the "True Positive Rate" (TPR), and the horizontal axis is the "False Positive Rate" (FPR).

ROC specific details

insert image description here

significance

insert image description here

Why propose AUC

When the two curves cross, it is difficult to judge who is good and who is bad.
AUC is used for comparison
insert image description here

Guess you like

Origin blog.csdn.net/weixin_45942265/article/details/119297725