Cross-validation points

1. The effect of cross-validation

Cross-validation method is a modeling and validation of the model parameters can be used to predict the performance evaluation model. The method of operation is to sample data into training and test sets, the training set is used to train the model, the model used to evaluate the test set.

When small sample data (point of less than meager), cross-validation may be used to train the model, to select the best model.

2. A simple cross-validation

Simple cross-validation is to scale the random sample data into training and test sets, and then validate the model and the training model parameters. Such operations several times in the sample data, each resulting training set and test set is different, and the trained model parameters are not the same, so that by assessing the loss function, and selecting the optimal model parameters.

3.K fold cross-validation

K-fold cross validation is to randomly divided into K sub-sample data, and then randomly select one of the (K-1) parts of the data as a training set to train the model, and finally as a test set. K out parts of the sample data (K-1) parts, there are K possible combinations, it may be less than equal to K times this operation and then by evaluating the loss of function, and selecting the optimal model parameters.

4. leave a cross-validation

A cross-validation leave K-fold cross validation is a special case, if the sample data is m, case K=m, i.e., verification is only a sample data set. Obviously, if the situation is not very little data, it will not only keep one as a validation sample set, so this method is mainly used for sample data very rare cases. This method is only one data do not participate in training, to maximize the training data close to the original data distribution.

5. How to Choose

From the above three cross-validation operation it can be known, if only a rough model, the simple cross-validation can, under normal circumstances are the K-fold cross validation, while leaving a small cross-validation is clearly for the case where the sample data .

If the sample data is sufficient, the sample can be divided into three sets of random data, a training set, as a validation set, a test set. Training set used to train the model, validation set to assess the quality of the prediction model, thereby selecting the optimum model and parameters of the test set is used to assess the ability of generalization of the model, and the optimal model parameters to final selected the most appropriate model and parameters.

Guess you like

Origin www.cnblogs.com/Ooman/p/11350130.html