Training set, test set, validation set

First, the difference between the three
training set (train set): data samples for model fitting.
The validation set (development set): a process model training sample set aside separately, it can be used to adjust the parameters of the model and a super capacity model for initial assessment.
In a neural network, we use the validation data set to find optimal network depth (number of hidden layers), or decide to stop back-propagation algorithm or the selected number of points hidden layer neuron in a neural network;

Test set: to evaluate the generalization ability of the final model mold. But not as a parameter adjustment, select the feature selection algorithms related basis.

A vivid metaphor:

    Training set ----------- student textbooks; students to acquire knowledge according to textbook content.

    Validation set ------------ jobs through the job can know different student learning, progress speed, speed.

    Test set ----------- exam, exam questions are usually not seen, examine students' ability to learn by analogy.

Conventionally, segmentation ratio is generally three: 6: 2: 2, is not necessary validation set.

 

Second, why should the test set
a) directly involved in the training set parameter adjustment process model, the model obviously can not be used to reflect the true ability (to prevent rote textbook students have the best results, that is, to prevent over-fitting).

b) validation set involved in the process to manually adjust parameters (hyper-parameters), it can not be used to judge a final model (brush exam students can not be regarded as a good student learning).

c) So you want to examine a school (die) by the final exam (test set) Health (type) real capacity (final exam).

But an examination alone on the right to judge the quality of the model is clearly unreasonable, so the next step is to introduce cross-validation

 

Guess you like

Origin www.cnblogs.com/Allen-rg/p/11547317.html