1.2 Orthogonalization-The third lesson of deep learning "Structured Machine Learning Project"-Professor Stanford Wu Enda

Orthogonalization

One of the challenges in building a machine learning system is that there are too many things you can try and change. Including, for example, there are so many hyperparameters that can be adjusted. I noticed that those highly efficient machine learning experts have a characteristic. They have clear thinking. They are very clear about what to adjust to achieve a certain effect. This step is called orthogonalization. Let me tell you what it means. Right.

Insert picture description here

This is an old TV picture, there are many knobs that can be used to adjust various properties of the image, so for these old TVs, there may be a knob to adjust the vertical height of the image, and another knob to adjust the image width, Maybe there is also a knob to adjust the trapezoidal angle, another knob to adjust the left and right offset of the image, and another knob to adjust the image rotation angle and the like. TV designers spent a lot of time designing circuits. At that time, they were usually analog circuits to ensure that each knob had a relatively clear function. Such as a knob to adjust this (height), a knob to adjust this (width), a knob to adjust this (trapezoid angle), and so on.

In contrast, imagine if you have a knob to adjust the 0.1 x 0.1x denotes an image height, + 0.3 x + 0.3x denotes image width, 1.7 x -1.7x represents a trapezoidal angle, + 0.8 x +0.8x denotes an image coordinate of the horizontal axis or the like. If you adjust this (one of) knobs, the height, width, trapezoidal angle, and pan position of the image will all change at the same time. If you have such a knob, it is almost impossible to adjust the TV so that the image is displayed in the center of the area.

So in this case, orthogonalization refers to the TV designer designing such a knob so that each knob can only adjust one property, so that it is much easier to adjust the TV image, and the image can be adjusted to the center.

Next is another example of orthogonalization. When you think about driving, a car has three main controls. The first is the steering wheel. The steering wheel determines how much you turn left and right, as well as the throttle and brakes. It is these three controls, one of which controls the direction, and the other two control your speed, which is easier to interpret. Know how different actions of different controls will affect the movement of the car.

Insert picture description here

Imagine if someone built a car like this and built a gamepad, one axis of the handle controlled 0.3 0.3* steering angle - speed, then there is a control shaft 2 2* Steering angle + 0.9 +0.9* vehicle speed, in theory, you can adjust the angle and the car speed by adjusting you want to get these two knobs, but this steering angle ratio control alone, separate speed control is much more difficult.

So the concept of orthogonalization means that you can come up with a dimension, what you want to do in this dimension is to control the steering angle, and another dimension to control your speed, then you need a knob to control only the steering angle, The other knob, in this driving example, is actually the throttle and brakes that control your speed. But if you have a control knob to mix the two, for example, such a control device affects your steering angle and speed at the same time, and changes both properties at the same time, then it is difficult to make your car at the desired speed and angle go ahead. However, after orthogonalization, orthogonal means 90 degrees to each other. Designing an orthogonal control device, the most ideal situation is consistent with the nature of the actual control you want, so that you can adjust the parameters much easier. The steering angle can be adjusted independently, as well as your throttle and brakes, so that the car moves in the way you want.

So what does this have to do with machine learning? To make a supervised learning system, you usually need to adjust the knob of your system.

There are four things to ensure. First, you usually have to ensure that at least the system gets good results on the training set, so the performance on the training set must pass some kind of evaluation to an acceptable level. For some applications, this may mean reaching Human-level performance, but it depends on your application, we will talk more about how to compare with human-level performance next week. However, after performing well on the training set, you want the system to perform well on the development set, and then you want the system to perform well on the test set. In the end, you want the system's cost function on the test set to perform satisfactorily in actual use. For example, you want users of these cat picture applications to be satisfied.

Insert picture description here

Let ’s return to the example of TV adjustment. If your TV image is too wide or too narrow, you want a knob to adjust it. You do n’t want to carefully adjust the five different knobs. They will also affect other image properties. A knob is needed to change the width of the TV image.

So similarly, if your algorithm does not fit the training set well on the cost function, you want a knob, yes I draw this to represent the knob, or a specific set of knobs, so you can use it to ensure that you Can adjust your algorithm to fit the training set well, so the knob you use for debugging is that you may be able to train a larger network, or you can switch to a better optimization algorithm, such as the Adam optimization algorithm, etc. Wait. We will discuss some other options this week and next week.

In contrast, if you find that the algorithm fits the development set poorly, you should have an independent set of knobs. Yes, this is another knob that I drew frizzy. You want to have an independent set of knobs. debugging. For example, if your algorithm does not perform well on the development set, it does a good job on the training set, but the development set does not, and then you have a set of regularized knobs to adjust. Try to make the system meet the second condition. Analogy to TV is that you have adjusted the width of the TV now. If the height of the image is not right, you need a different knob to adjust the height of the TV image. Then you hope this knob will not affect the width of the TV as much as possible. Increasing the training set can be another available knob, it can help your learning algorithm to better summarize the rules of the development set, and now the height and width of the TV image are adjusted.

What if it does not meet the third criterion? What if the system does a good job on the development set, but not on the test set? If so, then you need to adjust the knob, which may be a larger development set. Because if it does a good job on the development set, but not on the test set, this may mean that you are overfitting the development set, and you need to take a step back and use a larger development set.

Finally, if it does a good job on the test set, but it does not provide a good experience for your cat picture application users, this means you need to go back and change the development set or cost function. Because if the system does a good job on the test set based on a certain cost function, it cannot reflect the performance of your algorithm in the real world, which means either your development set distribution is not set correctly, or your cost function The measured index is wrong.

We will talk about these examples one by one soon, and we will introduce these specific knobs in detail later, which will be introduced later this week and next week. So if you ca n’t understand all the details now, do n’t worry, but I hope you have a concept for this orthogonalization process. You need to be very clear about which of the four problems you have and what different things you can adjust to try to solve that problem.

Insert picture description here

When I train a neural network, I generally do n’t use early stopping , and this technique is not bad. Many people do this. But personally, I find it difficult to analyze with early stopping , because this knob will also affect your fit to the training set, because if you stop early, the fit to the training set is not very good, but it also Used to improve the performance of the development set, so this knob is not so orthogonal. Because it affects two things at the same time, just like a knob affects both the width and height of the TV image. It's not that you shouldn't use it like this, it's okay if you want to use it. But if you have more orthogonal control, such as the other methods I wrote here, using these methods to adjust the network will be much simpler.

So I hope you guys have some idea of ​​the meaning of orthogonalization, just like you watch TV images. If you say, my TV image is too wide, so I have to adjust this knob (width knob). Or it is too high, so I have to adjust that knob (height knob). Or it is too trapezoidal, so I have to adjust this knob (trapezoid angle knob), which is good.

In machine learning, if you can observe your system and say that this part is wrong, it does not do well on the training set, does not do well on the development set, it does not do well on the test set, Or it does a good job on the test set, but it is not good in the real world, which is good. We must figure out what went wrong, and then we just have the corresponding knob, or a set of corresponding knobs, which can just solve the problem, which limits the performance of machine learning systems.

This is what we are going to talk about this week and next week, how to diagnose the system performance bottleneck. There is also a set of specific knobs that you can use to adjust your system to improve the performance of specific aspects of it. Let's start to talk about this process in detail.

Course PPT

Insert picture description here
Insert picture description here
Insert picture description here

Published 241 original articles · Like9 · Visitors 10,000+

Guess you like

Origin blog.csdn.net/weixin_36815313/article/details/105488918