Linear Regression with multiple variables - Multiple features

摘要: 本文是吴恩达 (Andrew Ng)老师《机器学习》课程,第五章《多变量线性回归》中第28课时《多变量》的视频原文字幕。为本人在视频学习过程中记录下来并加以修正,使其更加简洁,方便阅读,以便日后查阅使用。现分享给大家。如有错误,欢迎大家批评指正,在此表示诚挚地感谢!同时希望对大家的学习能有所帮助。

In this video (article), we'll start to talk about a new version of linear regression. That's more powerful one that works with multiple variables or with multiple features. Here's what I mean.

In the original version of linear regression that we developed, we had a single feature x, the size of the house, and we wanted to use that to predict y the price of the house and this was our form of hypothesis.

What if we had not only the size of the house as a feature or as a variable with which to try to predict the price, but that we also knew the number of bedrooms, the number of floors, and the age of home in years. It seems that this would give us a lot more information with which to predict the price.

  • I'm going to use the variables x_{1}, x_{2} and so on to denote my four features, and I'm going to continue to use y to denote the output variable price that we're trying to predict.
  • I'm going to use lower case "n" to denote the number of features. So in this example, we have n=4.
  • We were using m to denote the number of examples. So if you have 47 rows, "m" is the number of rows on this table or the number of training examples.
  • I'm also going to use x^{(i)} to denote the input features of the i^{th} training example.
    • As a concrete example, let's say x^{(2)} is going to be a vector of the features for my second training example. And so x^{(2)} here is going to be a vector \begin{bmatrix} 1416\\ 3\\ 2\\ 40 \end{bmatrix} since those are my four features that I have to try to predict the price of the second house.
    • Note that in this notation, the superscript (2) is not x to the power of 2. Instead, it's an index into my training set which says look at the second row of this table, this refers to my second training example.
  • I'm going to use also x^{(i)}_{j} to denote the value of feature number j in the i^{th} training example. So concretely, x^{(2)}_{3} will refer to feature number 3 in the 2nd training example which is equal to 2.

Now that we have multiple features, let's talk about what the form of the hypothesis should be.

  • Previously this (h_{\theta }(x)=\theta _{0}+\theta _{1}x) was the form of our hypothesis, where x was our single feature.
  • Now that we have multiple features, a form of the hypothesis in linear regression is going to be  h_{\theta }(x)=\theta _{0}+\theta _{1}x_{1}+\theta _{2}x_{2}+\theta _{3}x_{3}+\theta _{4}x_{4}.
    • Concretely, for a particular setting of our parameters, we may have h_{\theta }(x)=80+0.1x_{1}+0.01x_{2}+3x_{3}-2x_{4}. This would be one example of a hypothesis. And remember, a hypothesis is trying to predict the price of the house in thousands of dollars, just saying that the base price of the house is maybe 80,000, plus another 0.1x_{1}. So that's an extra hundred dollars per square feet, plus the price goes up a little bit for each additional bedroom that the house has, x_{2} is number of bedrooms; and it goes up further for additional floor the house has, because x_{3} was the number of floors, and the price goes down a little bit with each additional age of the house, with each additional year of the age of the house.

If we have n features, here's the form of a hypothesis. I'm going to introduce a little bit of notation to simplify this equation.

For convenience of notation,

  • Let me define x_{0}=1. Concretely, this means that for every example i, x^{(i)}_{0}=1. You can think of this as defining an additional zero feature. So whereas previously I had n features x_{1}, x_{2}...x_{n}. I'm now defining an additional sort of zero feature vector that always takes on the value of one. So now my feature vector x becomes this n+1 dimensional vector that is zero indexed.
  • I'm also going to think of my parameters as a vector which would be \begin{bmatrix} \theta _{0}\\ \theta _{1}\\ \theta _{2}\\ ...\\ \theta _{n} \end{bmatrix}. This is another zero index n+1 dimensional vector.
  • My hypothesis can now be written \theta _{0}x_{0}+\theta _{1}x_{1}+...+\theta _{n}x_{n}. And this equation is the same as this one on top because x_{0}=1. And the neat thing is I can now take this form of the hypothesis and write this as \theta ^{T}x.
    • If you write out what \theta ^{T}x is, this is \begin{bmatrix} \theta _{0} & \theta _{1} & ... & \theta _{n} \end{bmatrix}. So this thing here is \theta ^{T}, and this is actually 1\times (n+1) matrix which is also called a row vector. And you take that and multiply it with the vector x which is \begin{bmatrix} x_{0}\\ x_{1}\\ x_{2}\\ ...\\ x_{n} \end{bmatrix}. And so the inner product that is \theta ^{T}x which is just equal to this (\theta _{0}x_{0}+\theta _{1}x_{1}+...+\theta _{n}x_{n}). This gives us a convenient way to write the form of hypothesis as just the inner product between our parameter vector \theta and our feature vector x. And it is this little bit of notation that let us write this in this compact form. So that is the form of a hypothesis when we have multiple features.

And just to give this another name, this is also called multivariate linear regression. And the term multivariate, that's just maybe a fancy term for saying that we have multiple features, or multiple variables with which to try to predict the value y.

<end>

发布了41 篇原创文章 · 获赞 12 · 访问量 1306

猜你喜欢

转载自blog.csdn.net/edward_wang1/article/details/103285145
今日推荐