Cross-validation and grid search machine learning

Cross-validation

We will get the training data set is divided into training and validation machine. The following diagram, for example: the training data is divided into 4 parts, one of which as a validation set. Then after the test five times, each time changing different demonstrator,

Finally get the result 5 group model. Finally, taking the average as the final result. This is also known as 4-fold cross validation.

 

Grid search (super-parametric search):

Under normal circumstances, there are many parameters that need to be manually specified (such as K value K- nearest neighbor algorithm), this argument over religion. But the complicated manual process, several mentioned the need for pre-set super model parameter combinations. Each super parameters are used cross-validation to evaluate. Finally, select the optimal combination of parameters modeling.

Role: tune parameters.

API:sklearn.model_selection.GridSearchCV

 

 Examples K- nearest neighbor to the article to be modified, as follows:

 1 from sklearn.model_selection import GridSearchCV
 2 from sklearn.datasets import load_iris
 3 from sklearn.model_selection import train_test_split
 4 from sklearn.preprocessing import StandardScaler
 5 from sklearn.neighbors import KNeighborsClassifier
 6 
 7 def knn():
 8     """
 9     鸢尾花分类
10     :return: None
11     """
12 
13     # 数据集获取和分割
14      LR = load_iris ()
 15  
16      # normalized 
. 17      STD = StandardScaler ()
 18 is      X = std.fit_transform (lr.data)
 . 19  
20 is      x_train, x_test, y_train, android.permission.FACTOR. Train_test_split = (X, lr.target, test_size = 0.25 )
 21 is  
22 is      # Estimator flow 
23 is      KNN = KNeighborsClassifier ()
 24  
25      # configured search parameters values 
26 is      param = { ' N_NEIGHBORS ' : [3,5,10 ]}
 27  
28      # J trellis search 
29     GridSearchCV = GC (KNN, param = param_grid, CV = 10 )
 30  
31 is      gc.fit (x_train, y_train)
 32  
33 is      # forecast accuracy 
34 is      Print ( ' retest on the accuracy of the set: ' , gc.score (x_test, android.permission.FACTOR.))
 35      Print ( ' re-cross-validation best results: ' , gc.best_score_)
 36      Print ( ' the best model is selected: ' , gc.best_estimator_)
 37 [      Print ( ' each super each parameter the results of the cross validation ' , gc.cv_results_)
 38 is  
39      return None
 40  
41 is  IF __name__ == "__main__":
42     knn()

 

 

 

 

The results can be seen in the K = 10, the result is preferably 10-fold cross-validation, the accuracy rate of 95.5%. Slightly higher than 94% alone K- some Neighbor.

Guess you like

Origin www.cnblogs.com/GouQ/p/11871070.html