Linear regression with one variable - Cost function intuition II

摘要: 本文是吴恩达 (Andrew Ng)老师《机器学习》课程,第二章《单变量线性回归》中第9课时《代价函数的直观认识 - II》的视频原文字幕。为本人在视频学习过程中逐字逐句记录下来以便日后查阅使用。现分享给大家。如有错误,欢迎大家批评指正,在此表示诚挚地感谢!同时希望对大家的学习能有所帮助。

In this video (article), let's delve deeper and get even better intuition about what the cost function is doing. This video (article) assumes that you're familiar with contour plots. If you are not familiar with contour plots or contour figures, some of the illustrations in this video may or may not make sense to you, but it is ok. And if you end up skipping this video, or some of it doesn't quite make sense, because you haven't seen contour plots before, that's okay. And you will still understand the rest of this course without those parts of this.

Here's our problem formulation as usual, with the hypothesis, parameters, cost function, and our optimization objective. Unlike before, unlike the last video (article), I'm going to keep both of my parameters \theta _{0} and \theta _{1} as we generate our visualizations for the cost function.

So, same as last time, we want to understand the hypothesis h and cost function J. So, here's my training set of housing prices (red crosses), and let's make some hypothesis like that one, this is not a particularly good hypothesis. But, if I set \theta _{0}=50, and \theta _{1}=0.06, then I end up with this hypothesis down here (h_{\theta }(x)=50+0.06x), and that corresponds to that straight line (the black line). Now given these values of \theta _{0} and \theta _{1}, we want to plot the corresponding cost function on the right. What we did last time was, right, we only had \theta _{1}, in other words, drawing plots that look like this as a function of \theta _{1}. But now we have two parameters \theta _{0} and \theta _{1}, and so the plot gets a little more complicated. It turns out that when we have only one parameter, that the plots we drew had this sort of bowl-shaped function.

Now, when we have two parameters, it turns out the cost function also has a similar sort of bowl shape. And in fact, depending on our training set, you might get a function that maybe looks something like this. So, this is a 3-D surface plot, where the axes are labeled \theta _{0} and \theta _{1}. So, as you vary \theta _{0} and \theta _{1}, the two parameters, you get different values of the cost function J(\theta _{0},\theta_{1}), and the height of this surface above a particular point of \theta _{0}, \theta _{1}. Right, that's the vertical axis. The height of the surface of the points, indicates the value of J(\theta _{0},\theta_{1}). And you can see it sort of has this bowl like shape.

       

Let me show you the same figure in 3D, with axes \theta _{0}, \theta _{1} and vertical axis J(\theta _{0},\theta_{1}). And if I rotate this plot around (selected 3 snapshots for your reference), you kind of get a sense, I hope, of this bowl shaped surface, as that's what the cost function J looks like.

Now for the purpose of illustration in the rest of this video (article), I'm not actually going to use these sort of 3D surfaces to show you the cost function J, instead I'm going to use contour plots, or what I also call contour figures, I guess they mean the same thing. To show you these surfaces.

So, here's an example of a contour figure shown on the right, where the axes are \theta _{0} and \theta _{1}. And each of these ovals, what each of these ellipses shows is a set of points that takes on the same value for J(\theta _{0},\theta_{1}). So concretely, for example, let's take that point and that point and that point. All three of these points that I drew in magenta, they have the same value for J(\theta _{0},\theta_{1}). Where this is the \theta _{0}, \theta _{1} axis, but those three have the same value for J(\theta _{0},\theta_{1}). And if you haven't seen contour plots much before, think of, imagine if you will, a bowl-shaped function that's coming out of my screen, so that the minimum, the bottom of the bowl is this point right there, this middle of these concentric ellipses. And imagine a bowl shape that sort of grows out of my screen like this, so that each of these ellipses, you know, has the same height above my screen. And the minimum of the bowl, is right down there. And so, the contour figure is maybe a more convenient way to visualize my function J. So, let's look at some examples. Over here, I have a particular point (the red cross), right?  And so, this is, with \theta _{0}\approx 800 and \theta _{1}\approx -0.15. And so, this point in red corresponds to one set of pair values of (\theta _{0}, \theta _{1}), and they are corresponding, in fact, to that hypothesis, right, \theta _{0}\approx 800 where it intersects with the vertical axis is around 800, and this is slope of about -0.15. Now this line is really not such a good fit to the data, right? This hypothesis h(x), with these values of (\theta _{0}, \theta _{1}), it's really not such a good fit to the data. And so, you find that, its cost is a value that's out here (the red cross), that's you know pretty far from the minimum, right? This is a pretty high cost because this is just not that good a fit to the data. Let's look at some more examples.

Now here's a different hypothesis that's you know still not a great fit for the data, but maybe slightly better. So here that's my point, those are my parameters (\theta _{0}, \theta _{1}). And so, my \theta _{0}\approx 360, and my value for \theta _{1}= 0. So, you know, let's find it out. Let's take \theta _{0}\approx 360, \theta _{1}= 0. And this pair of parameters corresponds to that hypothesis, corresponds to a flat line, that is h_{\theta }(x)=360+0*x. So, that's my hypothesis. And this hypothesis again has some cost and that cost is, you know, plotted as the height of the J function at that point (red cross).

Let's look at just a couple of examples. Here's one more, you know, at this value of \theta _{0}, and at that value of \theta _{1}, we end up with this hypothesis h(x). And again, not a great fit for the data, and is actually further away from the minimum.

Last example, this is actually not quite at the minimum, but it's pretty close to the minimum. So, this is not such a bad fit to the data, where for a particular value of \theta _{0}, which, whatever the value is, and for a particular value for \theta _{1}, we get a particular h(x). And this is not quite at the minimum, but it's pretty close. And so, the sum of square errors is sum of square distances between my training examples and my hypothesis. Really, that's a sum of square distances of all of these errors. That's pretty close to the minimum, even though it's not quite the minimum. So, with these figures, I hope that gives you a better understanding of what values of the cost function J, how that corresponds to different hypothesis, as well as how better hypotheses may corresponds to points that are close to the minimum of the cost function J.

Now of course what we really want is an efficient algorithm, an efficient piece of software for automatically finding the value of \theta _{0} and \theta _{1}, that minimizes the cost function J, right? And what we don't wanna do is to, you know, how to write software, to plot out this point, and then try to manually read off the numbers, that is not a good way to do it. And in fact, we'll see it later that when we look at more complicated examples, we'll have higher dimensional figures with more parameters, that, it turns out, we'll see in a few, we'll see later in this course, examples where this figure cannot really be plotted, and because much harder to visualize. And so, what we want is to have software to find the value of \theta _{0} and \theta _{1} that minimize this function. And in next video (article) we start to talk about an algorithm for automatically finding that value of \theta _{0} and \theta _{1} that minimizes the cost function J.

<end>

发布了41 篇原创文章 · 获赞 12 · 访问量 1306

猜你喜欢

转载自blog.csdn.net/edward_wang1/article/details/102906146
今日推荐