1.6 The size of the development set and test set-Deep Learning Lesson 3 "Structured Machine Learning Project"-Professor Stanford Wu Enda

Size of Dev and Test Sets

In the last video, you know why your development set and test set must come from the same distribution, but how large should they be? In the era of deep learning, the guidelines for setting up development and test sets are also changing. Let ’s take a look at some best practices.

Insert picture description here

You may have heard a rule of thumb. In machine learning, divide all the data you get into a training set and a test set using a 70/30 ratio. Or if you have to set up training set, development set and test set, you would divide 60% training set, 20% development set, 20% test set. In the early days of machine learning, this division was quite reasonable, especially since the previous data set size was much smaller. So if you have a total of 100 samples, the rule of thumb of 70/30 or 60/20/20 points is quite reasonable. If you have thousands of samples or 10,000 samples, these practices are still reasonable.

But in modern machine learning, we are more used to operating on a much larger data set. For example, if you have 1 million training samples, this score may be more reasonable. 98% is a training set, 1% is a development set, and 1% is testing Set, we use D D and T T abbreviation to denote the development set and test set. Because if you have 1 million samples, then 1% is 10,000 samples, which may be enough for the development set and test set. So in the era of modern deep learning, sometimes we have a much larger data set, so it is reasonable to use less than 20% or less than 30% of the data as the development and test sets. And because the deep learning algorithm has a big appetite for data, we can see that there are problems with massive data sets, and a higher proportion of data is divided into the training set, so what about the test set?

Remember, the purpose of the test set is to complete the system development, the test set can help you evaluate the performance of the production system. The policy is to make your test set large enough to evaluate the overall performance of the system with high confidence. So unless you need to have a very accurate indicator of the final production system, generally the test set does not need millions of examples. For your application, maybe you think, 10,000 examples will give you enough confidence to give performance indicators, maybe 100,000 or something may be enough, this number may be much smaller than for example the overall data 30% of the set depends on how much data you have.

Insert picture description here

For some applications, you may not need a high-confidence assessment of system performance, or you may only need the training set and development set. I think it's okay not to separate a test set separately. In fact, sometimes in practice, some people will only be divided into a training set and a test set. They actually iterate on the test set, so there is no test set here. They have a training set and a development set, but no test set. If you are actually debugging this set, this development set or this test set, this is best called the development set.

However, in the history of machine learning, not everyone has a clear definition of terms. Sometimes people say that the development set should actually be regarded as a test set. But if you only have data to train, data to debug. You plan to deploy the final system directly regardless of the test set, so don't worry too much about its actual performance. I think this is also very good. Just call them the training set and the development set. Then make it clear that you do not have a test set, is this a bit abnormal? I definitely do not recommend omitting the test set when setting up the system, because there is a separate test set that gives me peace of mind. Because you can use this set of data without deviation to measure the performance of the system. But if your development set is very large, so that you will not overfit the development set too much, in this case, only the training set and the test set are not completely unreasonable. However, I generally do not recommend this.

To sum up, the old rule of thumb in the era of big data, this 70/30 is no longer applicable. It is now popular to divide a large amount of data into a training set, and then a small amount of data into a development set and a test set, especially when you have a very large data set. The previous rule of thumb is actually to ensure that the development set is large enough to achieve its purpose, is to help you evaluate different ideas, and then select A A still B B better. The purpose of the test set is to evaluate your final cost deviation. You only need to set up a sufficiently large test set that can be used for this evaluation. It may only need to be much less than 30% of the total data volume.

So I hope this video can give you a little guidance and advice to let you know how to set up development and test sets in the era of deep learning. Next, sometimes in the process of studying machine learning problems, you may need to change the evaluation indicators, or change your development set and test set, we will talk about when you need to do this.

Course blackboard

Insert picture description here
Insert picture description here
Insert picture description here

Published 241 original articles · Like9 · Visitors 10,000+

Guess you like

Origin blog.csdn.net/weixin_36815313/article/details/105491763