[Ch05-01] normal equation method to solve the problem of multivariable linear regression

Series blog, the original author maintained on GitHub: https://aka.ms/beginnerAI ,
click on the star with a star do not mean more stars the author harder.

5.1 normal equations solution

English name is Normal Equations.

For linear regression problem, in addition to the aforementioned least squares method to solve the problem of linear regression, but also can solve the problem of multiple linear regression.

For multiple linear regression, you can use the normal equations to solve, it is to get on a mathematical analytical solution. It can solve the problems described in the following formula:

\[y=a_0+a_1x_1+a_2x_2+\dots+a_kx_k \tag{1}\]

5.1.1 simple derivation method

In doing functional fit (regression), we assume that the function H as follows:

\[h(w,b) = b + x_1 w_1+x_2 w_2+...+x_n w_n \tag{2}\]

So \ (b = W 0 \) , then:

\[h(w) = w_0 + x_1 \cdot w_1 + x_2 \cdot w_2+...+ x_n \cdot w_n\tag{3}\]

Equation 3 where x is a sample n eigenvalues, if we calculate with m samples, this will be the following matrix:

\[H(w) = X \cdot W \tag{4}\]

Formula 5 in the shape of a matrix X and W are as follows:

\[ X = \begin{pmatrix} 1 & x_{1,1} & x_{1,2} & \dots & x_{1,n} \\ 1 & x_{2,1} & x_{2,2} & \dots & x_{2,n} \\ \dots \\ 1 & x_{m,1} & x_{m,2} & \dots & x_{m,n} \end{pmatrix} \tag{5} \]

\[ W= \begin{pmatrix} w_0 \\ w_1 \\ \dots \\ w_n \end{pmatrix} \tag{6} \]

Then we expect assume the function of output consistent with the true value, are:

\[H(w) = X \cdot W = Y \tag{7}\]

Wherein the shape of Y are as follows:

\[ Y= \begin{pmatrix} y_1 \\ y_2 \\ \dots \\ y_m \end{pmatrix} \tag{8} \]

Intuitively, W = Y / X, but there are three values ​​of matrix, and the matrix is ​​not division, it is necessary to obtain the inverse matrix X, and Y is multiplied by the inverse matrix to X. But will encounter a problem, just have a square inverse matrix, and X is not necessarily square, so to put the left turn into the square, it may be the inverse matrix exist. Therefore, while the first sides of the equation multiplied by the transpose of the matrix X, in order to obtain a square matrix X:

\[X^T X W = X^T Y \tag{9}\]

Wherein, \ (X ^ T \) is the transposed matrix X, \ (X ^ the TX \) must be a square, and assuming the existence of the inverse matrix, to move it to the right side of the equation:

\[W = (X^T X)^{-1}{X^T Y} \tag{10}\]

At this point we can find a normal equation W's.

5.1.2 Derivation complex

We still use the mean square error loss function:

\[J(w,b) = \sum (z_i - y_i)^2 \tag{11}\]

The constant b as an equal feature 1 and z = XW put into the formula, and becomes a matrix form:

\[J(w) = \sum (x_i w_i -y_i)^2=(XW - Y)^T \cdot (XW - Y) \tag{12}\]

Derivative of w, so the derivative is 0, W is the minimum value of the solution:

\[ \begin{aligned} {\partial J(w) \over \partial w} &= {\partial \over \partial w}[(XW - Y)^T \cdot (XW - Y)] \\ &={\partial \over \partial w}[(X^TW^T - Y^T) \cdot (XW - Y)] \\ &={\partial \over \partial w}[(X^TXW^TW -X^TW^TY - Y^TXW + Y^TY)] \end{aligned} \tag{13} \]

After derivation:

The first result is: \ (2X ^ TXW \)

Results of the second and third terms are: \ (the X-TY ^ \)

The fourth result is: 0

Then make derivative is zero:

\ [J (w) = 2x ^ bear - 2x ^ ty = 0 \ {14} \]
\ [X ^ bear = X ^ Ty \ {15} \]
\ [W = (X ^ TX) ^ {-1} X ^ Ty \ {16} \]

10 the same conclusions and formulas.

Basic formula derived above may refer to Chapter 0 Equation 60-69.

Inverse matrix \ ((X ^ TX) ^ {- 1} \) reasons may not be present are:

  1. Redundant feature values, such as \ (the 2_1-th x_2 = X ^ \) , i.e. the relationship between the side of the square area, can exist as two feature
  2. Too many features, such as feature is bigger than the number n of the number m of sample

The above two points do not exist in our particular example.

5.1.3 code implementation

We sample data in Table 5-1 is brought into the equation. According to the formula (5), we should establish the following X, Y matrix:

\[ X = \begin{pmatrix} 1 & 10.06 & 60 \\ 1 & 15.47 & 74 \\ 1 & 18.66 & 46 \\ 1 & 5.20 & 77 \\ \dots \\ \end{pmatrix} \tag{17} \]

\[ Y= \begin{pmatrix} 302.86 \\ 393.04 \\ 270.67 \\ 450.59 \\ \dots \\ \end{pmatrix} \tag{18} \]

According to equation (10):

\[W = (X^T X)^{-1}{X^T Y} \tag{10}\]

  1. X is 1000x3 matrix transpose of X is 3x1000, \ (X ^ the TX \) generates a (3x3) matrix
  2. \ ((X ^ TX) ^ {- 1} \) is 3x3
  3. Multiplied by \ (X-T ^ \) , i.e., (3x3) x (3x1000) of the matrix, it becomes 3x1000
  4. Multiplied by Y, Y is 1000x1, so (3x1000) x (1000x1) becomes 3X1, W is a solution, wherein the offset comprises a value b, and two weight values ​​w, 3 values ​​in a vector in
if __name__ == '__main__':
    reader = SimpleDataReader()
    reader.ReadData()
    X,Y = reader.GetWholeTrainSamples()
    num_example = X.shape[0]
    one = np.ones((num_example,1))
    x = np.column_stack((one, (X[0:num_example,:])))
    a = np.dot(x.T, x)
    # need to convert to matrix, because np.linalg.inv only works on matrix instead of array
    b = np.asmatrix(a)
    c = np.linalg.inv(b)
    d = np.dot(c, x.T)
    e = np.dot(d, Y)
    #print(e)
    b=e[0,0]
    w1=e[1,0]
    w2=e[2,0]
    print("w1=", w1)
    print("w2=", w2)
    print("b=", b)
    # inference
    z = w1 * 15 + w2 * 93 + b
    print("z=",z)

5.1.4 operating results

w1= -2.0184092853092226
w2= 5.055333475112755
b= 46.235258613837644
z= 486.1051325196855

We got two weight value and an offset value, then rate prediction value obtained z = 486 million.

At this point, we got the analytical solution. We can use this as a standard answer, to verify the results of our training the neural network.

Code location

ch05, Level1

Guess you like

Origin www.cnblogs.com/woodyh5/p/12028487.html