How to set up the training set, verification set and test set?

In deep learning, the ratio of training set, verification set and test set is usually determined according to specific data sets and tasks, and there is no fixed standard ratio .
However, here are some common ratio recommendations:

1. Training Set (Training Set): It usually occupies most of the overall data set, usually between 60% and 80% . A larger training set can help the model to better learn the characteristics and patterns of the data and improve the performance of the model.

2. Validation Set: The validation set is used to adjust the hyperparameters of the model and perform model selection to avoid overfitting. It is usually recommended to divide a small part of the data in the training set as the verification set, and the ratio is generally between 10% and 20% .

3. Test Set: The test set is used to finally evaluate the performance and generalization ability of the model, and is a measure of the predictive ability of the model in practical applications. It is usually recommended to keep a part of the independent data as the test set, and the ratio can be between 10% and 20% .

It should be noted that the above ratios are for reference only, and the actual situation may vary due to factors such as data set size, task complexity, and data distribution . Also, for smaller datasets, techniques such as cross-validation may be required to better assess model performance.

When dividing the data set, ensure that the three types of data sets are independent of each other and can represent the diversity and true distribution of the data. At the same time, maintaining the same division method and proportion can make the results more comparable and ensure the accuracy of the evaluation results.

Guess you like

Origin blog.csdn.net/qq_43308156/article/details/130750550