1.4 Satisfying and optimizing indicators-Deep Learning Lesson 3 "Structured Machine Learning Project"-Professor Stanford Wu Enda

Satisficing and Optimizing Metrics

It is sometimes not easy to combine all the things you consider into single real evaluation indicators. In those cases, I find that it is important to set up satisfaction and optimization indicators. Let me tell you what it means.

Insert picture description here

Assuming you have decided that you value the classification accuracy of the cat classifier, this can be F 1 F_1 Score or other indicators to measure accuracy. But in addition to accuracy, we also need to consider the running time, which is how long it takes to classify a picture. Classifier A A takes 80 milliseconds, B B takes 95 milliseconds, C C takes 1500 milliseconds, which means it takes 1.5 seconds to classify the image.

Insert picture description here

You can do this by combining accuracy and running time into an overall evaluation indicator. So the cost, for example, the overall cost is c The s t = a c c in r a c and 0.5 r in n n i n g T i m e cost=accuracy-0.5*runningTime , this combination may be too deliberate, only use this formula to combine accuracy and running time, the linear weighted sum of the two values.

You can also do other things, that is, you may choose a classifier, which can maximize the accuracy, but must meet the runtime requirements, that is, the time required to classify the image must be less than or equal to 100 milliseconds. So in this case, we say accuracy is an optimization indicator, because you want to maximize accuracy, you want to do as accurate as possible, but the running time is what we call the satisfaction indicator, meaning it must be sufficient Well, it only takes less than 100 milliseconds. After reaching it, you do n’t care how good this indicator is, or at least you do n’t care so much. So this is a fairly reasonable trade-off, or a combination of accuracy and runtime. The actual situation may be that as long as the running time is less than 100 milliseconds, your users will not care whether the running time is 100 milliseconds or 50 milliseconds, or even faster.

Insert picture description here

By defining optimization and meeting metrics, you can provide you with a clear way to choose the "best" classifier. In this case, classifier B is the best, because it has the best accuracy among all classifiers whose running time is less than 100 milliseconds.

Insert picture description here

So more generally, if you want to consider N N indicators, sometimes it is reasonable to choose one of them as an optimization indicator. So you want to optimize that indicator as much as possible, and then the rest N 1 N-1 indicator meets the indicator, which means that as long as they reach a certain threshold, for example, the running time is faster than 100 milliseconds, but as long as a certain threshold is reached, you do not care about its performance after exceeding that threshold, but they must reach this threshold.

Insert picture description here

Here is another example, assuming you are building a system to detect wake-up words, also called trigger words, which refers to voice-controlled devices. For example, Amazon Echo , you would say " Alexa ", or use " Okay Google " to wake up Google devices, or for Apple devices, you would say " Hey Siri ", or for some Baidu devices, we use "Hello Baidu" to wake up .

Yes, these are wake-up words. You can wake up these voice-controlled devices and then listen to what you want to say. So you may care about the accuracy of the trigger word detection system, so when someone says one of the trigger words, how likely is it that you can wake up your device.

You may also need to take into account the number of false positives ( false positives ), that is, when no one is saying the trigger word, how likely is it to be randomly awakened? So in this case, the reasonable way to combine these two evaluation indicators may be to maximize accuracy. So when someone says a wake-up word, the probability of your device being woken up is maximized, and then you must satisfy that there can only be at most 1 false positive in 24 hours, right? So your device will only wake up once a day when no one really speaks. So in this case, the accuracy is the optimization index, and then the occurrence of false positives every 24 hours is the satisfaction index. You only need to have at most one false positive every 24 hours.

Insert picture description here

To sum up, if you need to take into account multiple indicators, for example, there is an optimization indicator, you want to optimize as much as possible, and then there are one or more satisfaction indicators, which need to be met, need to reach a certain threshold. Now you have a fully automatic method. When observing multiple costs, choose the "best" one. Now these evaluation indicators must be calculated or calculated on the training set or development set or test set. So one more thing you need to do is to set up a training set, a development set, and a test set. In the next video, I want to share with you some guidelines on how to set up training, development, and test sets. Our next video continues.

Course PPT

Insert picture description here
Insert picture description here

Published 241 original articles · Like9 · Visitors 10,000+

Guess you like

Origin blog.csdn.net/weixin_36815313/article/details/105490600