Linear Algebra review - Matrix-vector multiplication

摘要: 本文是吴恩达 (Andrew Ng)老师《机器学习》课程，第三章《线性代数回顾》中第16课时《矩阵向量乘法》的视频原文字幕。为本人在视频学习过程中逐字逐句记录下来并加以修正，使其更加简洁，方便阅读，以便日后查阅使用。现分享给大家。如有错误，欢迎大家批评指正，在此表示诚挚地感谢！同时希望对大家的学习能有所帮助。

In this video (article), let's just start talking about how to multiply together two matrices. We'll start with a special case of matrix vector multiplication, of multiplying a matrix together with a vector. Let's start with an example.

Here's a matrix, and here's a vector, and let's say we want to multiply this matrix with this vector, what's the result? Let me just work through this example, and then we can step back, and look at just what the steps were. It turns out the result of this multiplication process is going to be, itself, a vector. To get the first element of this vector, I am going to take these two numbers ( $\begin{bmatrix} 1\\ 5 \end{bmatrix}$ ) and multiply them with the first row of the matrix ( $\begin{bmatrix} 1 & 3 \end{bmatrix}$ ), and add up the corresponding numbers. The first element is $1\times 1+3\times 5=16$ . Similarly, the second element is $4\times 1+0\times 5=4$ . And the third element is $2\times 1+1\times 5=7$ . It turns out that the results of multiplying a $3\times 2$ matrix by a $2\times 1$ matrix is going to be a $3\times 1$ matrix. So, I realize that I did that pretty quickly, and you're probably not sure that you can repeat this process yourself, but let's look in more detail at what just happened, and what this process of multiplying a matrix by a vector looks like.

Here is the detail of how to multiply a matrix by a vector. Let's say I have a matrix $A$ , and want to multiply it by a vector $x$ . The result is going to be some vector $y$ . So, the matrix $A$ is a $m\times n$ dimensional matrix, we are going to multiply it by a $n\times 1$ matrix, in other words an $n$ dimensional vector. It turns out that this " $n$ " here (column number of $A$ ) has to match this " $n$ " here (row number of $x$ ). In other words, the number of columns in this matrix $A$ has to match the number of rows in this vector $x$ . And the result of this product is going to be an $m$ -dimensional vector $y$ . So, how do you compute this vector $y$ ? It turns out to compute this vector $y$ , the process is: to get $y_{i}$ , multiply $A$ 's $i^{th}$ row with the element of the vector $x$ , and add them up. So, here's what I mean. In order to get the first number of $y$ , we're gonna take the first row of the matrix $A$ , and multiply them one at a time with the element of the vector $x$ . Then when we want to get the second element of $y$ , let's say this element (green). The way we do that is we take the second row of $A$ , and we repeat the whole thing. And you keep going to get the third element and so on, until I get down to the last row. So, that's the procedure.

Let's do one more example. So, let's look at the dimensions. Here, this is a $3\times 4$ dimensional matrix. This is a four-dimensional vector, or a $4\times 1$ matrix, and so the result of this, is going to be a three-dimensional vector. So, for the first element, I have $1\times 1+2\times3+1\times2+5\times1=14$ . And for the second element, it's $0\times1+3\times3+0\times2+4\times1=13$ . And finally, for the last element, it is $-1\times1-2\times3+0\times2+0\times1=-7$ . So my final answer is this vector $\begin{bmatrix} 14\\ 13\\ -7 \end{bmatrix}$ . And, as promised, the result is a $3\times 1$ matrix. So, that's how you multiply a matrix and a vector.

Finally, let me show you a neat trick. Let's say we have a set of four houses, so four houses with 4 sizes like these. And let's say I have a hypothesis for predicting what is the price of a house. Let's say I want to compute $h_{\theta }(x)$ for each of my 4 houses here. It turns out there's neat way of posing this, applying this hypothesis to all of my houses at the same time.

It turns out there is a neat way to pose this as a Matrix Vector multiplication. I am going to construct a matrix as follows. My matrix is going to be $\begin{bmatrix} 1 & 2104\\ 1 & 1416\\ 1 & 1534\\ 1 & 852\\ \end{bmatrix}$ . I'm going to construct a vector as well, and my vector is going to be this vector of two elements, that's here, this $\begin{bmatrix} -40\\ 0.25 \end{bmatrix}$ . That is these two coefficients $\theta _{0}$ and $\theta _{1}$ . And what I'm going to do is to take the matrix and that vector and multiply them together, that $\times$ is that multiplication symbol. So, what I get? Well this is a $4\times 2$ matrix. This is a $2\times 1$ matrix. So, the outcome is going to be a $4\times 1$ vector. So, let me write it as what are my four elements or my four real numbers here. Now it turns out the first element of this result is going to be $-40\times 1+0.25\times 2104$ . And this first element, of course, is $h$ applied to $2104$ . So it's really the predicted price of my first house. Well, how about the second element? I'm gonna take this ( $\begin{bmatrix} 1 & 1416 \end{bmatrix}$ ) and multiply it by my vector. And so that's gonna be $-40\times 1+0.25\times 1416$ . And so this is going to be $h_{\theta }(1416)$ . And so on for the third and the fourth elements of this $4\times 1$ vector.

And, the neat thing about this is that when you're actually implementing this in softwares, so when you have four houses and when you want to use your hypothesis to predict the prices, predict the price y of all these four houses. What this means is that, you can write this in one line of code. When we talk about octave, a program language later, you can actually write this in one line of code. You write prediction equals my, you know, data matrix times parameters ( $prediction=DataMatrix*parameters$ ), right? Where $DataMatrix$ is this thing here ( $\begin{bmatrix} 1 & 2104\\ 1 & 1416\\ 1 & 1534\\ 1 & 852\\ \end{bmatrix}$ ) and $Parameters$ is this thing here ( $\begin{bmatrix} -40\\ 0.25 \end{bmatrix}$ ), and this times ( $*$ ) is a matrix vector multiplication. And if you just do this, then this variable $prediction$ , then just implement this line of code assuming you have an appropriate library to do a matrix vector multiplication. If you just do this, then $prediction$ becomes this $4\times 1$ dimensional vector on the right, that just gives you all the predicted prices. And your alternative to doing this as a matrix vector multiplication would be write something like, you know, for i equals 1 to 4, right? And say you have a thousand houses, it would be for i equals 1 to a thousand or whatever. And then you have to write a prediction, you know, of i equals, and then do a bunch of work over there. It turns out that when you have a large number of houses, if you're trying to predict the prices of not just four, but maybe of a thousand houses, then it turns out that when you implement this in the computer, implementing it like this, in any of the various languages. This is not true only for Octave, but for Supra Server, Java, Python, other high-level, other languages as well. It turns out by writing code in the style on the left, it allows you to not only simplify the code, because now you're just writing one line of code rather than the forms of a bunch of things inside. But for subtle reasons that we'll see later, it turns out to be more computationally efficient to make the predictions on all of the prices of all your houses, doing it the way on the left than the way on the right then if you were to write your own formula. I'll say more about this later when we talk about vectorization, but so, by posing a prediction this way, you get not only a simpler piece of code, but a more efficient one.

So, that's it for matrix vector multiplication and we'll make good use of these sorts of operations as we develop the linear regression in other modules further. But, in the next video (article), we're going to take this and generalize this to the case of matrix matrix multiplication.

<end>

王彩旗

发布了41 篇原创文章 · 获赞 12 · 访问量 1306

私信关注

Linear Algebra review - Matrix-vector multiplication

猜你喜欢