1.10 Understanding Human Performance-The Third Course of Deep Learning "Structured Machine Learning Project"-Professor Stanford Wu Enda

Understanding Human-level Performance

The term human performance is often used randomly in papers, but I now tell you a more accurate definition of the word, especially the definition of the word human performance, which can help you advance the progress of machine learning projects. Remember in the last video, we used the term "human level error rate" to estimate Bayesian error, which is the theoretical minimum error rate, the lowest value that any function can reach, whether it is now or in the future. We first remember this, and then look at examples of medical image classification.

Insert picture description here

Suppose you want to observe such radiology images, and then make a classification diagnosis. Suppose an ordinary human, untrained human, achieves a 3% error rate on this task. Ordinary doctors, perhaps ordinary radiologists, can achieve a 1% error rate. Experienced doctors do better, with an error rate of 0.7%. There is also a team of experienced doctors, that is, if you have a team of experienced doctors, let them all look at this image, and then discuss and debate, their consensus reached a 0.5% error rate. So the question I want to ask you is, how should you define the human-level error rate? Is the human level error rate 3%, 1%, 0.7% or 0.5%?

You can also pause the video to think about it. To answer this question, I would like to ask you to remember that one of the most useful ways to think about human-level error rate is to use it as a replacement or estimate for Bayesian error rate. If you want, you can also pause the video and think about this issue.

But here I will directly define the human-level error rate, that is, if you want to replace or estimate the Bayesian error rate, then a team of experienced doctors can reach an error rate of 0.5% after discussion and debate. We know that the Bayesian error rate is less than or equal to 0.5%, because of some systems, these doctor teams can achieve an error rate of 0.5%. So by definition, the optimal error rate must be below 0.5%. We do n’t know how much better, maybe there is a bigger team and more experienced doctors can do better, so maybe it ’s a little better than 0.5%. But we know that the optimal error rate cannot be higher than 0.5%, so in this context, I can use 0.5% to estimate the Bayesian error rate. So I defined the human level as 0.5%, at least if you want to use human level errors to analyze deviations and variances, as in the last video.

Now, in order to publish research papers or deploy systems, perhaps the definition of human-level error rate may be different, you can use 1%, as long as you exceed the performance of an ordinary doctor, if you can achieve this level, the system has reached practical . Perhaps more than one radiologist, the performance of a doctor, means that the system can have deployment value in some cases.

The main point of this video is that when defining the human-level error rate, you need to figure out where your goals are. If you want to show that you can surpass a single human, then there are reasons to deploy your system on certain occasions. Perhaps this definition is appropriate . But if your goal is to replace the Bayesian error rate, then this definition (experienced team of doctors-0.5%) is appropriate.

Insert picture description here

To understand why this is important, let's look at an example of error rate analysis. For example, in the medical image diagnosis example, your training error rate is 5%, and your development error rate is 6%. And in the example on the previous slide, our human level performance, I see it as a substitute for Bayesian error rate, depends on whether you define it as the performance of an ordinary single doctor, or experience You may use 1% or 0.7% or 0.5% for the performance of the doctor or team of doctors. Also recall that the definition in the previous video, the direct difference between the Bayesian error rate or the estimate of the Bayesian error rate and the training error rate measures the so-called avoidable deviation, which (the training error and the development error The difference between) can measure or estimate how serious the variance problem of your learning algorithm is.

So in this first example, no matter what choices you make, the avoidable deviation is probably 4%. I think this value is somewhere between ..., if you take 1% it is 4%, if you take 0.5% it is 4.5% , And this gap (the difference between training error and development error) is 1%. So in this example, I have to say, no matter how you define the human-level error rate, use the error rate definition of a single general doctor, the error rate definition of a single experienced doctor or the error rate definition of an experienced doctor team 4% or 4.5%, which is obviously bigger than the variance problem. So in this case, you should focus on techniques to reduce deviations, such as training larger networks.

Insert picture description here

Now let ’s look at the second example. For example, your training error rate is 1% and your development error rate is 5%. This is actually not very important. This kind of problem is more like that discussed in academia. The human level is 1% Or 0.7% or 0.5%. Because no matter which definition you use, the way you measure avoidable deviations is if you use that value, it ’s before 0% to 0.5%, right? That is the gap between the human level and the training error rate, and this gap is 4%, so this 4% gap is larger than any defined avoidable deviation. So they suggest that you should mainly use tools to reduce variance, such as regularization or to get a larger training set.

When does it really work?

For example, your training error rate is 0.7%, so you have done a good job now, and your development error rate is 0.8%. In this case, you use 0.5% to estimate the Bayesian error rate. Because in this case, the avoidable deviation you measured is 0.2%, which is twice the 0.1% of the variance problem you measured, which indicates that maybe there is a problem with both deviation and variance. However, the deviation problem can be avoided to be more serious. In this example, we discussed 0.5% in the last slide, which is the best estimate of the Bayesian error rate, because a group of human doctors can achieve this goal. If you use 0.7 to replace the Bayesian error rate, the avoidable deviation you measured is basically 0%, then you may ignore the avoidable deviation. You should actually try to do better on the training set.

I hope that this will give you a bit of a concept, knowing why it will be more and more difficult to make progress on machine learning problems, and progress will become more and more difficult when you are close to human level.

Insert picture description here

In this example, once you are close to the 0.7% error rate, unless you are very careful in estimating the Bayesian error rate, you may not know how far away from the Bayesian error rate, so you should minimize avoidable bias. In fact, if you only know that a single general doctor can achieve a 1% error rate, it may be difficult to know whether you should continue to fit the training set. This kind of problem will only appear when your algorithm has done a good job. Only if you have achieved 0.7%, 0.8%, it will appear when it is close to human level.

In the two examples on the left, when you are far from the human level, it may be easier to put the optimization goal on the deviation or variance. This explains why, when you are close to human level, it is more difficult to tell whether the problem is deviation or variance. So the progress of machine learning projects is difficult to go further when you have done a good job.

Insert picture description here

To summarize what we have said, if you want to understand bias and variance, then in tasks that humans can do well, you can estimate human-level error rates, and you can use human-level error rates to estimate Bayesian error rates . So your gap to the Bayesian error rate estimate tells you how big the avoidable bias problem is and how serious the avoidable bias problem is, and the difference between the training error rate and the development error rate tells you that the variance problem How big is your algorithm can be generalized from the training set to the development set.

The major difference between what I talked about today and what I saw in the previous course is that you used to compare the training error rate and 0%, and use this value to estimate the deviation directly. In contrast, in this video, we have a more subtle analysis, which does not assume that you should get 0% error rate, because sometimes the Bayesian error rate is non-zero, sometimes it is basically impossible to achieve The error rate threshold is lower. So in the previous course, we measured the training error rate, and then observed how much the training error rate is higher than 0%. We use this difference to estimate how much the deviation is. But it turns out that the problem of Bayesian error rate is almost 0%. For example, identifying cats, human performance is close to perfect, so Bayesian error rate is also close to perfect. So when the Bayesian error rate is almost zero, you can do that. However, when there is a lot of data noise, such as speech recognition with a very noisy background sound, it is sometimes impossible to hear clearly and record it correctly. For such problems, it is necessary to better estimate the Bayesian error rate, which can help you better estimate avoidable bias and variance, so that you can make better decisions, choose a strategy to reduce bias, or reduce Variance strategy.

In retrospect, a rough estimate of the human level allows you to make an estimate of the Bayesian error rate, which allows you to make a faster decision whether you should focus on reducing the deviation of the algorithm, or reducing the variance of the algorithm. This decision-making technique is usually effective until your system performance begins to surpass humans, then your estimation of Bayesian error rate is no longer accurate, but these techniques can still help you make a clear decision.

Now, one of the exciting developments of deep learning is that for more and more tasks, our system can actually surpass humans. In the next video, let's continue to talk about the process beyond human level.

Course blackboard

Insert picture description here
Insert picture description here
Insert picture description here
Insert picture description here

Published 241 original articles · Like9 · Visitors 10,000+

Guess you like

Origin blog.csdn.net/weixin_36815313/article/details/105495299