Summarize
When dividing the training set and the test set
The parameter stratif in the function train_test_split
ROC curve
Why propose the ROC curve
In different application tasks, we can use different thresholds according to the task requirements.
For example, if we pay more attention to the "recall rate", we can set a larger threshold value to make the prediction results of the classifier more confident; if we pay more attention to the "recall rate", we can set the threshold value to be smaller, Let the classifier predict more positive examples.
Therefore, the quality of the threshold setting reflects the generalization performance of the learner under different tasks. In order to describe this change vividly, the ROC curve is introduced here. The ROC curve is a powerful tool to study the generalization performance of the learner from the perspective of threshold selection.
What is ROC curve
The vertical axis of the ROC curve is the "True Positive Rate" (TPR), and the horizontal axis is the "False Positive Rate" (FPR).
ROC specific details
significance
Why propose AUC
When the two curves cross, it is difficult to judge who is good and who is bad.
AUC is used for comparison