The road to machine learning: python grid search GridSearchCV model testing method

 

git:https://github.com/linyi0604/MachineLearning

How to determine which parameters a model should use? 

K-fold cross-validation:
Divide the sample into k parts,
take one of them as test data and the other as training
data , and perform k training and testing
in total. In this way, make full use of the sample data and evaluate the performance of the model on the sample.


Grid search :
A brute force enumeration search method
that lists the centralized possibilities for the model parameters, and
performs model evaluation on all the possible combinations listed
to find the best model parameter


python implementation code:
  1  from sklearn.datasets import fetch_20newsgroups
   2  from sklearn.cross_validation import train_test_split
   3  import numpy as np
   4  from sklearn.svm import SVC
   5  from sklearn.feature_extraction.text import TfidfVectorizer
   6  from sklearn.pipeline import Pipeline
   7  from sklearn.grid_search import GridSearchCV
   8  
  9  ''' 
10How  to determine which parameters a model should use?
11 
12  k-fold cross-validation:
 13 Divide     the sample into k parts
 14     Take one of them as test data and the other as training data 
 15     Carry out k training and testing in total
 16     In this way, make full use of the sample data to evaluate the performance of the model on the sample Performance
 17     
18     
19  Grid search:
 20      A brute force enumeration search method
 21      Enumerates the possible set of model parameters,
 22      Performs model evaluation on all enumerated possible combinations
 23      Finds the best model parameters
 24  
25  ''' 
26  
27  #Get all the data you want to ask online 
28 news = fetch_20newsgroups(subset= " all " )
 29  #Split training data and test data 
30x_train, x_test, y_train, y_test = train_test_split(news.data[:3000 ],
 31                                                      news.target[:3000 ],
 32                                                      test_size=0.25 ,
 33                                                      random_state=33 )
 34  
35  #Use pipeline to simplify the system construction process 
36 clf = Pipeline ([( " vect " , TfidfVectorizer(stop_words= " english " , analyzer= " word " )), ( " svc " , SVC())])
37  
38  #The hyperparameters to be tested here are two 4 svg__gama and 3 svg__C, a total of 12 combinations 
39  # np.logspace(start, end, num) from 10^start to 10^end to create an equal ratio of num numbers Sequence 
40 parameters = { " svc__gamma " : np.logspace(-2, 1, 4), " svc__C " : np.logspace(-1, 1, 3 )}
 41  
42  #Grid search 
43  #Create a grid search : 12 sets of parameter combinations, 3-fold cross-validation 
44 gs = GridSearchCV(clf, parameters, verbose=2, refit=True, cv=3 )
 45  
46  #Execute a single-threaded grid search 
47 time_ = gs.fit(x_train, y_train )
 48  print(time_)
 49  print (gs.best_params_, gs.best_score_)
 50  #Output the accuracy of the best model on the test machine and 
51  print (gs.score(x_test, y_test))
 52  ''' 
53  Fitting 3 folds for each of 12 candidates, totalling 36 fits
 54  [CV] svc__C=0.1, svc__gamma=0.01 ................................ .....
 55  [CV] ................................. svc__C=0.1, svc__gamma=0.01 - 8.3s
 56  [Parallel (n_jobs=1)]: Done 1 out of 1 | elapsed: 8.3s remaining: 0.0s
 57  [CV] svc__C=0.1, svc__gamma=0.01 ...... .................
 58 [CV] ............................ svc__C=0.1, svc__gamma=0.01 -   8.5s
 59 [CV] svc__C=0.1, svc__gamma=0.01 .....................................
 60 [CV] ............................ svc__C=0.1, svc__gamma=0.01 -   8.5s
 61 [CV] svc__C=0.1, svc__gamma=0.1 ......................................
 62 [CV] ............................. svc__C=0.1, svc__gamma=0.1 -   8.4s
 63 [CV] svc__C=0.1, svc__gamma=0.1 ......................................
 64 [CV] ............................. svc__C=0.1, svc__gamma=0.1 -   8.5s
 65 [CV] svc__C=0.1, svc__gamma=0.1 ......................................
 66 [CV] ............................. svc__C=0.1, svc__gamma=0.1 -   8.5s
 67 [CV] svc__C=0.1, svc__gamma=1.0 ......................................
 68 [CV] ............................. svc__C=0.1, svc__gamma=1.0 -   8.4s
 69 [CV] svc__C=0.1, svc__gamma=1.0 ......................................
 70 [CV] ............................. svc__C=0.1, svc__gamma=1.0 -   8.6s
 71 [CV] svc__C=0.1, svc__gamma=1.0 ......................................
 72 [CV] ............................. svc__C=0.1, svc__gamma=1.0 -   8.6s
 73 [CV] svc__C=0.1, svc__gamma=10.0 .....................................
 74 [CV] ............................ svc__C=0.1, svc__gamma=10.0 -   8.5s
 75 [CV] svc__C=0.1, svc__gamma=10.0 .....................................
 76 [CV] ............................ svc__C=0.1, svc__gamma=10.0 -   8.6s
 77 [CV] svc__C=0.1, svc__gamma=10.0 .....................................
 78 [CV] ............................ svc__C=0.1, svc__gamma=10.0 -   8.7s
 79 [CV] svc__C=1.0, svc__gamma=0.01 .....................................
 80 [CV] ............................ svc__C=1.0, svc__gamma=0.01 -   8.3s
 81 [CV] svc__C=1.0, svc__gamma=0.01 .....................................
 82 [CV] ............................ svc__C=1.0, svc__gamma=0.01 -   8.4s
 83 [CV] svc__C=1.0, svc__gamma=0.01 .....................................
 84 [CV] ............................ svc__C=1.0, svc__gamma=0.01 -   8.5s
 85 [CV] svc__C=1.0, svc__gamma=0.1 ......................................
 86 [CV] ............................. svc__C=1.0, svc__gamma=0.1 -   8.3s
 87 [CV] svc__C=1.0, svc__gamma=0.1 ......................................
 88 [CV] ............................. svc__C=1.0, svc__gamma=0.1 -   8.4s
 89 [CV] svc__C=1.0, svc__gamma=0.1 ......................................
 90 [CV] ............................. svc__C=1.0, svc__gamma=0.1 -   8.5s
 91 [CV] svc__C=1.0, svc__gamma=1.0 ......................................
 92 [CV] ............................. svc__C=1.0, svc__gamma=1.0 -   8.5s
 93 [CV] svc__C=1.0, svc__gamma=1.0 ......................................
 94 [CV] ............................. svc__C=1.0, svc__gamma=1.0 -   8.6s
 95 [CV] svc__C=1.0, svc__gamma=1.0 ......................................
 96 [CV] ............................. svc__C=1.0, svc__gamma=1.0 -   8.7s
 97 [CV] svc__C=1.0, svc__gamma=10.0 .....................................
 98 [CV] ............................ svc__C=1.0, svc__gamma=10.0 -   8.5s
 99 [CV] svc__C=1.0, svc__gamma=10.0 .....................................
100 [CV] ............................ svc__C=1.0, svc__gamma=10.0 -   8.6s
101 [CV] svc__C=1.0, svc__gamma=10.0 .....................................
102 [CV] ............................ svc__C=1.0, svc__gamma=10.0 -   8.7s
103 [CV] svc__C=10.0, svc__gamma=0.01 ....................................
104 [CV] ........................... svc__C=10.0, svc__gamma=0.01 -   8.4s
105 [CV] svc__C=10.0, svc__gamma=0.01 ....................................
106 [CV] ........................... svc__C=10.0, svc__gamma=0.01 -   8.4s
107 [CV] svc__C=10.0, svc__gamma=0.01 ....................................
108 [CV] ........................... svc__C=10.0, svc__gamma=0.01 -   8.7s
109 [CV] svc__C=10.0, svc__gamma=0.1 .....................................
110 [CV] ............................ svc__C=10.0, svc__gamma=0.1 -   8.6s
111 [CV] svc__C=10.0, svc__gamma=0.1 .....................................
112 [CV] ............................ svc__C=10.0, svc__gamma=0.1 -   8.6s
113 [CV] svc__C=10.0, svc__gamma=0.1 .....................................
114 [CV] ............................ svc__C=10.0, svc__gamma=0.1 -   8.6s
115 [CV] svc__C=10.0, svc__gamma=1.0 .....................................
116 [CV] ............................ svc__C=10.0, svc__gamma=1.0 -   8.5s
117 [CV] svc__C=10.0, svc__gamma=1.0 .....................................
118 [CV] ............................ svc__C=10.0, svc__gamma=1.0 -   8.6s
119 [CV] svc__C=10.0, svc__gamma=1.0 .....................................
120 [CV] ............................ svc__C=10.0, svc__gamma=1.0 -   9.3s
121 [CV] svc__C=10.0, svc__gamma=10.0 ....................................
122 [CV] ............................... svc__C=10.0, svc__gamma=10.0 - 8.8s
 123  [CV] svc__C=10.0, svc__gamma= 10.0 .................................................
 124  [CV] ...... .................svc__C=10.0, svc__gamma=10.0 - 8.9s
 125  [CV] svc__C=10.0, svc__gamma=10.0 ........... .........................
 126  [CV] ................................ ...... svc__C=10.0, svc__gamma=10.0 - 8.7s
 127  
128  12 sets of hyperparameters 3-fold cross-validation A total of 36 search terms took 5.2 minutes
 129  [Parallel(n_jobs=1)]: Done 36 out of 36 | elapsed: 5.2min finished
 130  
131  best parameter best training score
 132  {'svc__C': 10.0, 'svc__gamma': 0.1} 0.79066666666666666
 133  best model test score
 134 0.8226666666666667
135 
136 '''

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325646752&siteId=291194637