Examples [Notes machine learning] plotted ROC curve the Python

Examples of the ROC curve drawn Python

ROC stands for "receiver operating characteristic" (Receiver Operating Characteristic) curve.

Sample prediction result of the learner according to sort, in this order one by one as the positive example sample prediction, calculated every two significant quantity value (TPR, FPR), respectively, which is a cross-ordinate plotted . The PR curve using precision, recall rate of vertical, horizontal different, the longitudinal axis of the ROC curve is "true Example Rate" (True Positive Rate, TTR), the horizontal axis is the "false positive rate Example" (False Positive Rate, FPR).

as the picture shows:

FIG be made using the following Python:

# -*- coding: utf-8 -*-
"""
Created on Tue Mar 24 19:04:21 2020

@author: Bean029
"""

import numpy as np
import matplotlib.pyplot as plt
from itertools import cycle

from sklearn import svm, datasets
from sklearn.metrics import roc_curve, auc
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import label_binarize
from sklearn.multiclass import OneVsRestClassifier
from scipy import interp
from sklearn.metrics import roc_auc_score

# Import some data to play with
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Binarize the output
y = label_binarize(y, classes=[0, 1, 2])
n_classes = y.shape[1]

# Add noisy features to make the problem harder
random_state = np.random.RandomState(0)
n_samples, n_features = X.shape
X = np.c_[X, random_state.randn(n_samples, 200 * n_features)]

# shuffle and split training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.5,
                                                    random_state=0)

# Learn to predict each class against the other
classifier = OneVsRestClassifier(svm.SVC(kernel='linear', probability=True,
                                 random_state=random_state))
y_score = classifier.fit(X_train, y_train).decision_function(X_test)

# Compute ROC curve and ROC area for each class
fpr = dict()
tpr = dict()
roc_auc = dict()
for i in range(n_classes):
    fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_score[:, i])
    roc_auc[i] = auc(fpr[i], tpr[i])

# Compute micro-average ROC curve and ROC area
fpr["micro"], tpr["micro"], _ = roc_curve(y_test.ravel(), y_score.ravel())
roc_auc["micro"] = auc(fpr["micro"], tpr["micro"])

plt.figure()
lw = 2
plt.plot(fpr[2], tpr[2], color='darkorange',
         lw=lw, label='ROC curve (area = %0.2f)' % roc_auc[2])
plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic example')
plt.legend(loc="lower right")
plt.show()

ROC curve trade-off curve and PR

ROC curve is relatively stable lot, both in the positive and negative sample sufficiently, sufficient reflected judgment ROC curve model. Therefore, for the same model, PRC and ROC curve can explain some problems, but both have some relevance, if you want to review the effect of models, you can put the two curves are drawn comprehensive evaluation. For supervised binary classification problem, in positive and negative samples are enough, can be directly used ROC curve, AUC, KS evaluation model effect. In the threshold determination process, the classification performance can be evaluated according to the model Precision, Recall or F1. For multi-classification problem, we can calculate the Precision, Recall and F1 for each category, as a model of comprehensive evaluation index.

Area under the ROC curve, between 0.1 and 1, as the quality evaluation value can be intuitively classifier, the better the value.

ROC curve is characteristic of the true positive rate false positive rate Y axis and X axis.

This means that the upper left corner of the chart is "ideal" point - the false positive rate was 0, 1 true positive rate. This is not very realistic, but it does mean a larger area under the curve (AUC) is usually better.

"Steepness" ROC curves is also important, because maximizing the true positive rate while minimizing the false positive rate is desirable.

ROC curve typically used to study the binary classification of the output of the classifier. To extend and ROC ROC curve area to the multi-label classification, it is necessary to output binarized. Each tag may draw a ROC curve, but may be a binary prediction (mean microscopic) by each element of the matrix is ​​regarded as the label indicator plotted ROC curve.

 

Published 619 original articles · won praise 185 · views 660 000 +

Guess you like

Origin blog.csdn.net/seagal890/article/details/105081188