ROC curve drawing (Python)

First, take the support vector machine model as an example

First import the packages that need to be used, we will use the roc_curve function to draw the ROC curve!

from sklearn.svm import SVC
from sklearn.metrics import roc_curve
from sklearn.datasets import make_blobs
from sklearn. model_selection import train_test_split
import matplotlib.pyplot as plt
%matplotlib inline

Then use the following make_blobs function to generate a two-category data imbalance dataset;

Use the train_test_split function to divide the training set and test set data;

Train the SVC model.

X,y = make_blobs(n_samples=(4000,500), cluster_std=[7,2], random_state=0)

X_train,X_test,y_train, y_test = train_test_split(X,y,random_state=0)

clf = SVC(gamma=0.05).fit(X_train, y_train)

fpr,tpr, thresholds = roc_curve(y_test,clf.decision_function(X_test))

plt.plot(fpr,tpr,label='ROC')

plt.xlabel('FPR')
plt.ylabel('TPR')

As you can see from the above code, we use the roc_curve function to generate three variables, namely fpr, tpr, thresholds, that is, false positive rate (FPR) , true positive rate (TPR) and threshold.

Among them, fpr and tpr are the horizontal and vertical coordinates of the ROC curve we draw, so we use the variable fpr as the horizontal coordinate and tpr as the vertical coordinate to draw the corresponding ROC image as follows:

It is worth noting that the decision_function function used by the support vector machine model above is unique to itself, and other models cannot be used directly.

For example, if we want to use the results of other models (such as decision tree models) to draw ROC, if we directly apply the above code, an error will be reported, and it will show that there is no such function.

Take the decision tree model as an example to solve the above problems (applicable to models other than vector machines )

The code for importing the decision tree model package and training the model is omitted, you only need to change it manually, let's look directly at the drawing code!

fpr,tpr, thresholds = roc_curve(y_test,clf.predict_proba(X_test)[:,1])


plt.plot(fpr,tpr,label='ROC')

plt.xlabel('FPR')
plt.ylabel('TPR')

It can be seen that we directly change the function decision_function, which is only applicable to the support vector machine model , to predict_proba(X_test)[:,1] . Let us see the result:

It can be seen that the generalization ability of the decision tree model on this data set is not as good as that of the support vector machine! ! ! Is it useless to study?

better looking drawing

auc = roc_auc_score(y_test,clf.predict_proba(X_test)[:,1])
# auc = roc_auc_score(y_test,clf.decision_function(X_test))
fpr,tpr, thresholds = roc_curve(y_test,clf.decision_function(X_test))
plt.plot(fpr,tpr,color='darkorange',label='ROC curve (area = %0.2f)' % auc)
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic example')
plt.legend(loc="lower right")
plt.savefig('suhan.jpg',dpi=800)
plt.show()

 

Supongo que te gusta

Origin blog.csdn.net/weixin_46803857/article/details/121793432
Recomendado
Clasificación