1. Use make_blobs generates data set, the data set is then pre-
# Import data set generator from sklearn.datasets Import make_blobs # Import splitting tool data set from sklearn.model_selection Import train_test_split # preprocessing tool introduced from sklearn.preprocessing Import StandardScaler # Import multilayer perceptron neural network from sklearn.neural_network import MLPClassifier # import drawing tools import matplotlib.pyplot as plt
Generating a number of samples # 200, classified as a 2, a standard deviation of the data set 5 X-, Y = make_blobs (N_SAMPLES = 200, 2 = Centers, cluster_std = 5) # split the data set into training and test sets X_train, X_test , y_train, android.permission.FACTOR. train_test_split = (X-, Y, = 38 is random_state) # preprocessing of the data Scaler = StandardScaler (). Fit (X_train) X_train_scaled = scaler.transform (X_train) X_test_scaled = scaler.transform (X_test) # processing morphology data for printing Print ( '\ n-\ n-\ n-') Print ( 'code running results') Print (' ====================== ============== \ n-') # the data processing form after printing print (' training data set: '. format (X_train_scaled.shape, { })' tag form: { } '. the format (X_test_scaled.shape)) Print (' \ n-=================================== = ') Print (' \ n-\ n-\ n-')
Results code runs ==================================== training data set: (150, 2) form tag : (50, 2) ====================================
# Raw training data set plt.scatter (X_train [:, 0], X_train [:,. 1]) # trained set pretreated plt.scatter (X_train_scaled [:, 0] , X_train_scaled [:, 1], marker = '^', edgecolor = 'K') # Add FIG title plt.title ( 'Scaled & Training Training SET SET') # display picture plt.show ()
- Here you can see, StandardScaler the training data set has become more "gather"
# Guiding grid search from sklearn.model_selection Import GridSearchCV # set the grid model parameters dictionary search params = { 'hidden_layer_sizes': [ (50,), (100,), (100,100)], 'alpha': [0.0001 , 0.01, 0.1]} # grid search model established grid = GridSearchCV (MLPClassifier (max_iter = 1600, = 38 is random_state), the params = param_grid, CV =. 3, IID = False) # fitting the data grid.fit (X_train_scaled, y_train ) # print results Print ( '\ n-\ n-\ n-') Print ( 'code running results') Print (' ====================== ============== \ n-') Print ( "model best score: {:. 2f}' format ( grid.best_score_.), ' optimal parameters of the model: {}'. the format (grid.best_params_)) Print ( '\ n-====================================') print ( '\ n \ n \ n')
Code running results ==================================== model best score: 0.81 optimal model parameters: { 'Alpha': from 0.0001, 'hidden_layer_sizes': (50,)} ================================== ==
# Print model scores in the test set Print ( '\ n-\ n-\ n-') Print ( 'code running results') Print (' ==================== ================ \ n-') Print (' set test score: {} '. the format (grid.score (X_test_scaled, android.permission.FACTOR.))) Print (' \ n-== ================================== ') Print (' \ n-\ n-\ n-')
Results code runs ==================================== test set Score: 0.82 ====== ==============================
- This practice may see a high score model, but think about this approach is wrong, we cross-validation, the training set and split into training fold and validation fold, but pretreatment with StandardScaler when it is used for validation fold training fold and fit together. As a result, cross-validation score is inaccurate.
2. Use pipe model (the Pipeline)
Model # introduction line from the Pipeline Import sklearn.pipeline # create a channel model and a neural network comprising a preprocessing pipeline = Pipeline ([( 'scaler ', StandardScaler ()), ( 'mlp', MLPClassifier (max_iter = 1600, random_state = 38 ))]) # duct fitting model training set pipeline.fit (X_train, y_train) fraction tube model # printing print ( 'MLP model using tube model Rating: {:. 2f}' format ( pipeline.. score (X_test, y_test)))
Use tube model MLP model rating: 0.82
- We used two methods Pipeline pipe model, one is used for data preprocessing StandardScaler. Another is the maximum number of iterations 1600 MLP multilayer perceptron neural network.
3. Use the piping model grid search
Results GridSearchCV split training and validation sets, not split train_test_split training and test sets, but in the training set train_test_split split again split, the resulting
Parameter Set # dictionary -------- (mlp__ mlp algorithm specified for the pipeline) the params = { 'mlp__hidden_layer_sizes': [(50,), (100,), (100,100)],' mlp__alpha ': [0.0001,0.001,0.01,0.1]} # create a channel model and a neural network comprising a preprocessing pipeline = pipeline ([(' scaler ', StandardScaler ()), (' mlp ', MLPClassifier (max_iter = 1600, random_state = 38))]) # pipe model will join the grid search grid = GridSearchCV (pipeline, the params = param_grid, CV =. 3, IID = False) # fit the training set grid.fit (X_train, y_train) # print model cross-validation score. best parameter set and test scores Print ( '\ n-\ n-\ n-') Print ( 'code running results') Print (' ================= =================== \ n-') Print (' cross-validation highest score: {}. 2F :. 'the format (grid.best_score_.)) Print (' best model there parameters: {} '. the format (grid.best_params_)) Print (' test set score:. {} 'format (grid .score(X_test,y_test))) Print ('\n====================================') print('\n\n\n')
Code running results ==================================== cross-validation highest score: 0.80 model most parameters: { 'mlp__alpha': 0.0001, ' mlp__hidden_layer_sizes': (50,)} test set score: 0.82 ============================= =======
- In hidden_layer_sizes and alpha are added in front of such a prefix mlp__, do so in order to have multiple algorithms pipeline, we need to know the parameters of pipeline which is passed to the algorithm.
Step # pipe model print Print ( '\ n-\ n-\ n-') Print ( 'code running results') Print (' ===================== =============== \ n-') Print (pipeline.steps) Print (' \ n-==================== ================ ') Print (' \ n-\ n-\ n-')
代码运行结果 ==================================== [('scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('mlp', MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9, beta_2=0.999, early_stopping=False, epsilon=1e-08, hidden_layer_sizes=(100,), learning_rate='constant', learning_rate_init=0.001, max_iter=1600, momentum=0.9, n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5, random_state=38, shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1, verbose=False, warm_start=False))] ====================================
to sum up:
In addition to being able to put more algorithms to integrate, implement the code simple, we also avoid the pipe models in the pretreatment process, the improper use of the way the training and validation sets error pretreatment. By using pipe model, grid search may be performed before each split training set and validation set, re-training and validation sets pretreatment operation, to avoid over-fitting the model case.
Quoted from the article: "layman's language python machine learning"