K-fold cross-validation: the difference between the contact on StratifiedKFold and KFold

The key when training the neural network step To evaluate the generalization ability of the model, a model if performance is not good, either because the model is too complex to be over-fitting (high variance), either model is too simple cause lead underfitting (high deviation). To solve this problem, we must learn to both interleaving cross-validation and validation count --holdout k-fold cross-validation to evaluate the generalization ability of the model.

K-fold cross validation results calculated as the average of parameter assessment model, and therefore use k-fold cross validation to find the optimal parameters of the method is more stable than the holdout. Once we find the optimal parameters, this set of parameters to be used to train the model on the original data set as the final model.

1 KFold 

class sklearn.model_selection.KFold(n_splits=’warn’, shuffle=False, random_state=None)

Provide train / test Index

Provide training / testing index to break the data into training / test set. To split the data set of k consecutive folding (by default without shuffling).

 

Provides train/test indices to split data in train/test sets. Split dataset into k consecutive folds (without shuffling by default).

 

Then each folded as a validation, the k-1 remaining folded to form a training set.

 n_splits  int, default = 3

Several fold. You must be at least 2.

Changes in version 0.20: n_splits defaults v0.22 Changes from 3 to 5.

Shuffle  Boolean value, optional

Whether the data is shuffled before split into batches.

random_state  int, or no RandomState example, optional, default = None

If int, a random_state seed random number generator is used; otherwise, false. If RandomState instance, random_state random number generator; if None, the random number generator is used RandomState instance np.random. Whenshuffle used == True.

 

class sklearn.model_selection.StratifiedKFold(n_splits=’warn’, shuffle=False, random_state=None)

 

 

 

 

First, they are applied to the cross-validation function K.

The difference is, KFold function only for

 

Guess you like

Origin www.cnblogs.com/gengyi/p/11628730.html