[Machine learning will know will be cross-validation]

Foreword

When we train a machine learning model based on the data, we expect it to maintain a high reserve ratio on the new data, which we need a different model of learning to have a model evaluation criteria.

To assess the generalization performance model (Model means on the ability to predict unknown data), the model falls to prevent " over-fitting " is stuck. We artificially original data is divided into a training set and a test set , the former is used to train the models for which the generalization performance evaluation model.

Training set, validation and test sets

In supervised learning modeling, data collection is often divided into two or three groups (validation set sometimes does not appear): training set (train set), validation set (validation) and test set (test set).
Training set used to train the model, validation set is used to determine the complexity of the control model parameters, the generalization testing set for performance evaluation model. However, practical applications, we often simply the data set into training and test sets.

Cross-validation of category

Cross-validation comprising a simple cross-validation, K K -fold cross-validation and leave-one-three.

1. A simple cross-validation

Simple direct cross-validation data set into a training set and a validation set, using the first training set to train the model under different parameter combinations, combinations of different parameters of the error and then evaluated on the test set of models, select the smallest measurement error model.

2.K fold cross-validation

First, the sample data set randomly divided into K K disjoint subsets of data, followed by a subset of data therein as a test set, the remaining K 1 K-1 parts of a subset of data as a training set to train the model, and finally the smallest measurement error as the final model selection model. Principle as shown below:
K-fold cross-validation

3. Tomeichi method

when K K -fold cross-validation is K K equal to the number of samples in the data set N N , we will get when K K -fold cross-validation exception: leave-one. Since leaving the training set using a method is only one less than the original sample data set, so the result is often more accurate assessment. But when the sample data set is large, the more need training model.

Leave due to the special nature of a law, often using smaller amounts of data in time.

Guess you like

Origin blog.csdn.net/TOMOCAT/article/details/91348333