The difference between test set and validation set

Training data (Test Data): used for model building
Validation data (Validation Data): optional, used to assist model building, can be reused.
Test Data: Used to detect model building, this data is only used in model testing to evaluate the accuracy of the model. It is absolutely not allowed to be used in the model building process as it will result in overfitting.

Author: Scofield
Link: https://www.zhihu.com/question/26588665/answer/161718839
Source: Zhihu The
copyright belongs to the author. For commercial reprints, please contact the author for authorization, and for non-commercial reprints, please indicate the source.


 In fact, the main difference between the two is: the validation set is used to further determine the hyperparameters in the model (such as the regularization term coefficient, the number of nodes in the hidden layer of the ANN, etc.) and the test set is only used to evaluate the accuracy of the model ( i.e. generalization ability)!
 
For example: Suppose to build a BP neural network, we do not have a good method to determine the number of nodes in the hidden layer. At this time, the number of nodes is generally set to a specific value, and the corresponding parameters are trained through the training set, and then the error of the model is detected by the cross-validation set; then the number of nodes is changed, and the above process is repeated until the cross The validation error is minimal. The number of nodes at this time can be considered as the optimal number of nodes, that is, the number of nodes (this parameter) is obtained through the cross-validation set. The test set is used to judge the learning model according to the test error after all parameters are determined; it can also be said to be used to evaluate the generalization ability of the model. Therefore, the validation set is mainly used for model tuning.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325846819&siteId=291194637