Credibility Evaluation Classification Model

1. The prediction accuracy rate of classification model

Prediction accuracy ############################# classification model ############## ######################### 
# import data set generation tools 
from sklearn.datasets import make_blobs 
# import numpy 
import numpy NP AS 
# drawing tools introduced 
import matplotlib aS plt .pyplot 
# 200 generates the sample number classified into two, the standard deviation of the data set 5 
X-, Y = make_blobs (N_SAMPLES = 200, random_state =. 1, 2 = Centers, cluster_std = 5) 
# plotted scattergram 
plt .scatter (X-[:, 0], X-[:,. 1], Y = C, = plt.cm.cool CMap, edgecolor = 'K') 
# display image 
plt.show ()

# Import data set splitting tool 
from sklearn.model_selection Import train_test_split 
# Bayesian model introduced 
from sklearn.naive_bayes Import GaussianNB 
# dataset split into training set and test set 
X_train, X_test, y_train, y_test = train_test_split (X, y , random_state = 68) 
# training Gaussian Bayesian model 
GNB = GaussianNB () 
gnb.fit (X_train, y_train) 
# obtain a classification accuracy of Bayesian Gaussian probability 
predict_proba = gnb.predict_proba (X_test) 
# print the result 
print ( 'forecast form accuracy:. {} 'format (predict_proba.shape ))
Form prediction accuracy: (50, 2)
# 5 before printing accuracy of 
print (predict_proba [: 5])
[[0.98849996 0.01150004]
 [0.0495985  0.9504015 ]
 [0.01648034 0.98351966]
 [0.8168274  0.1831726 ]
 [0.00282471 0.99717529]]
# Longitudinal cross-range setting 
x_min, x_max X-= [:, 0] .min () - .5, X-[:, 0] .max () + .5 
y_min, y_max X-= [:,. 1]. min () - .5, X-[:,. 1] .max () + .5 

XX, YY = np.meshgrid (np.arange (x_min, x_max, 0.2), np.arange (y_min, y_max, 0.2)) 
= gnb.predict_proba the Z (np.c_ [xx.ravel (), yy.ravel ()]) [:,. 1] 
the Z = Z.reshape (xx.shape) 
# drawing contour 
plt.contourf (xx, yy , the Z, CMap = plt.cm.summer, Alpha = .8) 
# plotted scattergram 
plt.scatter (X_train [:, 0] , X_train [:, 1], c = y_train, cmap = plt.cm.cool , edgecolor = 'K') 
plt.scatter (X_test [:, 0], X_test [:,. 1], C = android.permission.FACTOR., CMap = plt.cm.cool, edgecolor = 'K', Alpha = 0.6) 
# set cross longitudinal range 
plt.xlim (xx.min (), xx.max ()) 
plt.ylim (yy.min (), yy.max ()) 
# longitudinal axis and a lateral unit 
plt.xticks (()) 
PLT. yticks (())  
# display image
plt.show ()

  • FIG dots represent the test sample data set, representative of a first classification region cyan, red zone represents the second classification, in the middle of the two regions, a region of the gradient portion, the data in this region point model that is "ambiguous" points

2. The coefficient of determination of classification model

# Import SVC model 
from sklearn.svm Import SVC 
# training set to train the model 
SVC = SVC (Gamma = 'Auto'). Fit (X_train, y_train) 
# determined coefficients to obtain the SVC 
dec_func = svc.decision_function (X_test) 
# print decision coefficients in the first five 
print (dec_func [: 5])
[ 0.02082432  0.87852242  1.01696254 -0.30356558  0.95924836]
= Svc.decision_function the Z (np.c_ [xx.ravel (), yy.ravel ()]) 
the Z = Z.reshape (xx.shape) 
# Drawing Contour 
plt.contourf (xx, yy, Z, cmap = plt.cm.summer, Alpha = .8) 
# plotted scattergram 
plt.scatter (X_train [:, 0] , X_train [:, 1], c = y_train, cmap = plt.cm.cool, edgecolor = 'k ') 
plt.scatter (X_test [:, 0], X_test [:,. 1], C = android.permission.FACTOR., CMap = plt.cm.cool, edgecolor =' K ', Alpha = 0.6) 
# longitudinal axis and a lateral range 
plt. XLIM (xx.min (), xx.max ()) 
plt.ylim (yy.min (), yy.max ()) 
# set FIG title 
plt.title ( 'the SVC decision_function') 
# longitudinal axis and a lateral unit 
plt .xticks (()) 
plt.yticks (()) 
# display images 
plt.show ()

  • Categories are represented by the same cyan and red regions, such as the more obvious the cyan region of each data point is located, then the model determines that the data points belonging to class 1, class 2 belonging to the contrary, those points in the region of the gradient, models also feel that "ambiguous" point.
#在scikit-learn中,使用网格搜索GridSearchCV类时,如果要改变评分方式,只需修改scoring参数即可.
#如对随机森林分类进行评分,可以直接这样修改
#修改scoring参数为roc_auc
grid = GirdSearchCV(RandomForestClassifier(),param_grid = param_grid,scoring = 'roc_auc')

 

总结 : 

  SVC(支持向量机)的decision_function和GaussianNB(朴素贝叶斯)的predict_proba有相似之处,也有很大的差异.两者都可以使用多元分类任务.

  我们使用交叉验证法.网格搜索法,分类模型的可信度评估,这些方法都可以帮助我们对模型进行评估并找到相对较优的参数.

  还有.score方法给模型评分:对于分类模型来说,默认情况下.score给出的评分是模型分类的准确率(accuracy)

                对于回归模型来说,默认情况下.score给出的评分回归分析中的R平方的分数.

  其他对模型评分的方法 : 精度(Precision),召回率(Recall),f1分数(f1-score),ROC(Receiver Operating Characteristic Curve),AUC(Area Under Curve)

 

文章引自 : 《深入浅出python机器学习》

Guess you like

Origin www.cnblogs.com/weijiazheng/p/10966275.html