Cross-validation
We will get the training data set is divided into training and validation machine. The following diagram, for example: the training data is divided into 4 parts, one of which as a validation set. Then after the test five times, each time changing different demonstrator,
Finally get the result 5 group model. Finally, taking the average as the final result. This is also known as 4-fold cross validation.
Grid search (super-parametric search):
Under normal circumstances, there are many parameters that need to be manually specified (such as K value K- nearest neighbor algorithm), this argument over religion. But the complicated manual process, several mentioned the need for pre-set super model parameter combinations. Each super parameters are used cross-validation to evaluate. Finally, select the optimal combination of parameters modeling.
Role: tune parameters.
API:sklearn.model_selection.GridSearchCV
Examples K- nearest neighbor to the article to be modified, as follows:
1 from sklearn.model_selection import GridSearchCV 2 from sklearn.datasets import load_iris 3 from sklearn.model_selection import train_test_split 4 from sklearn.preprocessing import StandardScaler 5 from sklearn.neighbors import KNeighborsClassifier 6 7 def knn(): 8 """ 9 鸢尾花分类 10 :return: None 11 """ 12 13 # 数据集获取和分割 14 LR = load_iris () 15 16 # normalized . 17 STD = StandardScaler () 18 is X = std.fit_transform (lr.data) . 19 20 is x_train, x_test, y_train, android.permission.FACTOR. Train_test_split = (X, lr.target, test_size = 0.25 ) 21 is 22 is # Estimator flow 23 is KNN = KNeighborsClassifier () 24 25 # configured search parameters values 26 is param = { ' N_NEIGHBORS ' : [3,5,10 ]} 27 28 # J trellis search 29 GridSearchCV = GC (KNN, param = param_grid, CV = 10 ) 30 31 is gc.fit (x_train, y_train) 32 33 is # forecast accuracy 34 is Print ( ' retest on the accuracy of the set: ' , gc.score (x_test, android.permission.FACTOR.)) 35 Print ( ' re-cross-validation best results: ' , gc.best_score_) 36 Print ( ' the best model is selected: ' , gc.best_estimator_) 37 [ Print ( ' each super each parameter the results of the cross validation ' , gc.cv_results_) 38 is 39 return None 40 41 is IF __name__ == "__main__": 42 knn()
The results can be seen in the K = 10, the result is preferably 10-fold cross-validation, the accuracy rate of 95.5%. Slightly higher than 94% alone K- some Neighbor.