Linear Regression with multiple variables - Normal equation and non-invertibility

摘要: 本文是吴恩达 (Andrew Ng)老师《机器学习》课程,第五章《多变量线性回归》中第34课时《奇异阵情况下正规方程的解法》的视频原文字幕。为本人在视频学习过程中记录下来并加以修正,使其更加简洁,方便阅读,以便日后查阅使用。现分享给大家。如有错误,欢迎大家批评指正,在此表示诚挚地感谢!同时希望对大家的学习能有所帮助。

In this video/article, I want to talk about the normal equation and non-invertibility. This is a somewhat more advanced concept, but it is something that I've often been asked about. And so I wanted to talk about it here. But this is a somewhat advanced concept, so feel free to consider this optional material. There's a phenomena that you may run into that's maybe somewhat useful to understand. But even if you don't understand it, the normal equation and linear regression, you should probably get that to work okay.

Here is the issue:

For those of you that are maybe somewhat more familiar with linear algebra, what some students have asked me is, when computing this \theta =(X^{T}X)^{-1}X^{T}y, what if the matrix X^{T}X is non-invertible? So, for those of you that know a bit more linear algebra you may know that only some matrices are invertible and some matrices do not have an inverse, we call those non-invertible matrices singular or degenerate matrices. The issue or the problem of X^{T}X being non-invertible should happen pretty rarely. And in Octave, if you implement this to compute \theta, it turns out that this will actually do the right thing. I'm getting a little technical now and I don't want to go into details, but Octave has two functions for inverting matrices: one is called pinv(), and the other is called inv(). The differences between these two are somewhat technical. One's called the pseudo-inverse, one's called the inverse. You can show mathematically that so long as you use the pinv() function, then this will actually compute the value of \theta you want, even if X^{T}X is non-invertible. What is the difference between pinv() and inv()? That is somewhat advanced numerical computing concepts that I don't really want to get into. But I thought in this optional video, I try to give you a little bit of intuition about what it means for X^{T}X to be non-invertible. For those of you that know a bit more linear algebra and might be interested.

I'm not going to prove this mathematically, but if X^{T}X is non-invertible, there are usually two most common causes:

The first cause is if somehow, in your learning problem, you have redundant features, concretely, if you try to predict housing prices and if x_{1} is the size of a house in square-feet, and x_{2} is the size of house in square-meters, then 1 meter is equal to 3.28 feet, rounded to two decimals, and so your two features will always satisfy the constraint that x_{2}=(3.28)^{2}x_{1}. This is somewhat advanced linear algebra now, but if you're an expert in linear algebra, you can actually show that if your two features are related via a linear equation like this, then X^{T}X will be non-invertible.

The second thing that can cause X^{T}X to be non-invertible is if you're trying to run a learning algorithm with a lot of features. Concretely, if m is less than or equal to n. For example, if you imagine that you have m equals 10 training examples, but you have n equals 100 features, then you're trying to fit a parameter vector \theta, which is n+1 dimensional. so it's 101-dimensional and you're trying to fit 101 parameters for just 10 training examples. And this turns out to sometimes work, but to not always be a good idea. Because, as we see later, you might not have enough data if you only have 10 examples to fit 100 or 101 parameters. We'll see later in this course, why this might be too little data to fit this many parameters. But commonly, what we do then if m is less than n, is to see if we can either delete some features or to use a technique called regularization, which is something we'll talk about later, in this course as well, that will let you fit a lot of parameters using a lot of features even if you have a relatively small training set. But this regularization will be a later topic in this course. But to summarize, if ever you find that X^{T}X is singular or alternatively find is non-invertible, what I would recommend you do is first: look at your features and see if you have redundant features like these x_{1} and x_{2} being linearly dependent, or being a linear function of each other, like so. And if you do have redundant features and if you just delete one of these features - you really don't need both of these features, so if you just delete one of these features that will solve your non-invertibility problem. and, so first think through my features and check if any are redundant. And if so, keep deleting the redundant features until they are no longer redundant. And if your features are not redundant, I would check if I might have too many features, and if that's the case I would either delete some features if I can bear to use fewer features, or else I would consider using regularization, which is the topic we will talk about later. So, that's it for the normal equation and what it means if the matrix X^{T}X is non-invertible. But this is a problem that hopefully you run into pretty rarely. And if you just implement it in Octave using the pinv() function which is called the pseudo-inverse function so you use a different linear algebra library, that is called pseudo-inverse, but that implementation should just do the right thing even if that X^{T}X is non-invertible. Which should happen pretty rarely anyway so this should not be a problem for most implementation of linear regression.

<end>

发布了41 篇原创文章 · 获赞 12 · 访问量 1306

猜你喜欢

转载自blog.csdn.net/edward_wang1/article/details/103930738