1. The prediction accuracy rate of classification model
Prediction accuracy ############################# classification model ############## ######################### # import data set generation tools from sklearn.datasets import make_blobs # import numpy import numpy NP AS # drawing tools introduced import matplotlib aS plt .pyplot # 200 generates the sample number classified into two, the standard deviation of the data set 5 X-, Y = make_blobs (N_SAMPLES = 200, random_state =. 1, 2 = Centers, cluster_std = 5) # plotted scattergram plt .scatter (X-[:, 0], X-[:,. 1], Y = C, = plt.cm.cool CMap, edgecolor = 'K') # display image plt.show ()
# Import data set splitting tool from sklearn.model_selection Import train_test_split # Bayesian model introduced from sklearn.naive_bayes Import GaussianNB # dataset split into training set and test set X_train, X_test, y_train, y_test = train_test_split (X, y , random_state = 68) # training Gaussian Bayesian model GNB = GaussianNB () gnb.fit (X_train, y_train) # obtain a classification accuracy of Bayesian Gaussian probability predict_proba = gnb.predict_proba (X_test) # print the result print ( 'forecast form accuracy:. {} 'format (predict_proba.shape ))
Form prediction accuracy: (50, 2)
# 5 before printing accuracy of print (predict_proba [: 5])
[[0.98849996 0.01150004] [0.0495985 0.9504015 ] [0.01648034 0.98351966] [0.8168274 0.1831726 ] [0.00282471 0.99717529]]
# Longitudinal cross-range setting x_min, x_max X-= [:, 0] .min () - .5, X-[:, 0] .max () + .5 y_min, y_max X-= [:,. 1]. min () - .5, X-[:,. 1] .max () + .5 XX, YY = np.meshgrid (np.arange (x_min, x_max, 0.2), np.arange (y_min, y_max, 0.2)) = gnb.predict_proba the Z (np.c_ [xx.ravel (), yy.ravel ()]) [:,. 1] the Z = Z.reshape (xx.shape) # drawing contour plt.contourf (xx, yy , the Z, CMap = plt.cm.summer, Alpha = .8) # plotted scattergram plt.scatter (X_train [:, 0] , X_train [:, 1], c = y_train, cmap = plt.cm.cool , edgecolor = 'K') plt.scatter (X_test [:, 0], X_test [:,. 1], C = android.permission.FACTOR., CMap = plt.cm.cool, edgecolor = 'K', Alpha = 0.6) # set cross longitudinal range plt.xlim (xx.min (), xx.max ()) plt.ylim (yy.min (), yy.max ()) # longitudinal axis and a lateral unit plt.xticks (()) PLT. yticks (()) # display image plt.show ()
- FIG dots represent the test sample data set, representative of a first classification region cyan, red zone represents the second classification, in the middle of the two regions, a region of the gradient portion, the data in this region point model that is "ambiguous" points
2. The coefficient of determination of classification model
# Import SVC model from sklearn.svm Import SVC # training set to train the model SVC = SVC (Gamma = 'Auto'). Fit (X_train, y_train) # determined coefficients to obtain the SVC dec_func = svc.decision_function (X_test) # print decision coefficients in the first five print (dec_func [: 5])
[ 0.02082432 0.87852242 1.01696254 -0.30356558 0.95924836]
= Svc.decision_function the Z (np.c_ [xx.ravel (), yy.ravel ()]) the Z = Z.reshape (xx.shape) # Drawing Contour plt.contourf (xx, yy, Z, cmap = plt.cm.summer, Alpha = .8) # plotted scattergram plt.scatter (X_train [:, 0] , X_train [:, 1], c = y_train, cmap = plt.cm.cool, edgecolor = 'k ') plt.scatter (X_test [:, 0], X_test [:,. 1], C = android.permission.FACTOR., CMap = plt.cm.cool, edgecolor =' K ', Alpha = 0.6) # longitudinal axis and a lateral range plt. XLIM (xx.min (), xx.max ()) plt.ylim (yy.min (), yy.max ()) # set FIG title plt.title ( 'the SVC decision_function') # longitudinal axis and a lateral unit plt .xticks (()) plt.yticks (()) # display images plt.show ()
- Categories are represented by the same cyan and red regions, such as the more obvious the cyan region of each data point is located, then the model determines that the data points belonging to class 1, class 2 belonging to the contrary, those points in the region of the gradient, models also feel that "ambiguous" point.
#在scikit-learn中,使用网格搜索GridSearchCV类时,如果要改变评分方式,只需修改scoring参数即可. #如对随机森林分类进行评分,可以直接这样修改 #修改scoring参数为roc_auc grid = GirdSearchCV(RandomForestClassifier(),param_grid = param_grid,scoring = 'roc_auc')
总结 :
SVC(支持向量机)的decision_function和GaussianNB(朴素贝叶斯)的predict_proba有相似之处,也有很大的差异.两者都可以使用多元分类任务.
我们使用交叉验证法.网格搜索法,分类模型的可信度评估,这些方法都可以帮助我们对模型进行评估并找到相对较优的参数.
还有.score方法给模型评分:对于分类模型来说,默认情况下.score给出的评分是模型分类的准确率(accuracy)
对于回归模型来说,默认情况下.score给出的评分回归分析中的R平方的分数.
其他对模型评分的方法 : 精度(Precision),召回率(Recall),f1分数(f1-score),ROC(Receiver Operating Characteristic Curve),AUC(Area Under Curve)
文章引自 : 《深入浅出python机器学习》