Cross-validation
Brief
Verification refers to the measure of the quality of the model of the machine learning model training.
Cross-validation model selection is a commonly used method, using partial data set of model validation.
A common method
Common cross validation divided into three kinds:
1. A simple cross-validation
The data set is divided into two portions (or three parts), 70% as a training set, as a validation set of 30%. With 70% of the data, choose a different model parameters, for training. After the data using 30% of the (untrained) for authentication. Choose the best model.
2.S fold cross-validation
The data sets of similar size into S disjoint sets of data, using the S-1 partial data to train the model, a portion of the remaining data validation. After several training optimal model selected.
[Note] each validation set are likely to be different.
3. leave a cross-validation
S is actually a special form fold cross-validation, i.e. the size of the dataset, and when small (less than 100, even more exaggerated). The folded S S = N, where N is the size of the data. Leaving one to do data validation.
Details of
all the data sets in the choice of the time, we need to emphasize the selection (random sampling) independent and identically distributed, because the scientific theory of machine learning is in this framework arising
Reference
[1] "statistical learning method" P14-P15
[2] https://www.cnblogs.com/pinard/p/5992719.html