Training set, validation set, the test set proportion

When the relatively small amount of data, can be used 7: 3 training data and test data, or the 6: 2: 2 training data, test data and verification data.

(Watermelon book describes the common practice is about 2/3 - 4/5 of sample data used for training and the remaining samples for testing)

When the amount of data is very large, may be used 98: 1: 1 training data, test data and verification data.

 


 


Traditional machine learning phase (data set in this order of ten thousand), as is generally the allocation ratio 6: 2: 2 .

The era of big data, this ratio is not as useful. Because one million data set, even if we take 1% of the test data also do a million and more, it is sufficient. That data can do more training. Thus the common ratio can reach 98: 1: 1 , even up to 99.5: 0.3: 0.2 and the like.

Guess you like

Origin www.cnblogs.com/tectal/p/11113063.html