Least squares idea

Least squares fit function or functions mainly for extrema, which is thought mainly by the square of the distance and the predicted value of the theoretical value and the minimum. In machine learning, especially in the regression model, one can often see the shadow of the least squares method.

The least square method and the problem to be solved

In the form of the least square method shown in the following formula:

\ [Objective function = \ sum (theory - predicted value) ^ 2 \]

Theoretical values are sample data, the prediction value is a prediction value obtained by fitting function. The objective function is lost in machine learning functions. Our goal is to get the model fitting function of the objective function to minimize the time.
For example linear regression, there are m samples such that only one feature.

\[(x_{1},y_{1}),(x_{2},y_{2}),...(x_{m},y_{m}) \]

Since only one characteristic, assuming the fit function is:

\[h_{\theta}(x) = \theta_0 + \theta_1x \]

This will have a characteristic, two parameters of the fit function corresponding to \ (\ theta_0 and \ theta_1x \) necessary to calculate
objective function is:

\[J(\theta_0,\theta_1)= \sum_{i=0}^{m}(y_{i}-h_{\theta}x_{i})^2 \]

What to do with the method of least squares that? The \ (J (\ theta_0, \ theta_1) \) minimum, calculated the \ (J (\ theta_0, \ theta_1) \) the most hours of \ (\ theta_0 and \ theta_1 \) , then how can the least squares method \ (J (\ theta_0, \ theta_1) \) minimum it?

Algebraic method of least squares solution

To \ (J (\ theta_0, \ theta_1) \) minimum, the method is to \ (\ theta_0 and \ theta_1 \) respectively partial derivatives, so that the partial derivative to zero, to obtain about \ (\ theta_0 and \ theta_1 \) binary equations, the equations can be solved to obtain \ (\ theta_0 and \ theta_1 \) values.
\ (J (\ theta_0, \ theta_1) \) of \ (\ theta_0 \) the following equation, we get the:

\[\sum_{i=1}^m(y_{i}-\theta_0 - \theta_1x_{i}) = 0 \tag{1} \]

\ (J (\ theta_0, \ theta_1) \) of \ (\ theta_1 \) , we get the following equation:

\[\sum_{i=1}^m(y_{i}-\theta_0 - \theta_1x_{i})x^{i} = 0 \tag{2} \]

Equation (1) Expand the following expression;

\[m\sum_{i=1}^m\frac{y_i}{m}-m\theta_0-m\theta_1\sum_{i=1}^m\frac{x_i}{m}=0 \]

Order \ (\ vec x = \ sum_
{i = 1} ^ m \ frac {x_i} {m}, \ vec y = \ sum_ {i = 1} ^ m \ frac {y_i} {m} \) then the formula (1) has the following expression:

\[m\vec y - m\theta_0 - m\theta_1\vec x=0 \]

\[=>\theta_0= \vec y - \theta_1\vec x \tag{3} \]

Equation (3) into (2) are:

\(\sum_{i=1}^my_ix_i-\theta_0\sum_{i=1}^mx_i-\theta_1\sum_{i=1}^mx_i^2=0\)
=>\(\sum_{i=1}^my_ix_i-(\vec y - \theta_1\vec x)\sum_{i=1}^mx_i-\theta_1\sum_{i=1}^mx_i^2=0\)
=>\(\theta_1=\frac{\sum_{i=1}^my_i x_i - \vec y\sum_{i=1}^mx_i}{\sum_{i=1}^mx_i^2-\vec x \sum_{i=1}^mx_i}\)
=>\(\theta_1=\frac{\sum_{i=1}^my_i x_i - \vec y\sum_{i=1}^mx_i - m\vec y\vec x +m\vec y\vec x}{\sum_{i=1}^mx_i^2-2\vec x \sum_{i=1}^mx_i+\vec x\sum_{i=1}^mx_i}\)
=>\(\theta_1=\frac{y_ix_i-\vec yx_i-y_i\vec x+\vec y\vec x}{\sum_{i=1}^m(x_i^2-2\vec xx_i+\vec x^2)}\)
=>\(\theta_1=\frac{(x_i-\vec x)(y_i-\vec y)}{\sum_{i=1}^m(x_i-\vec x)^2}\)

The \ (\ theta_0 also be based on \ theta_0 = \ vec y - \ theta_1 \ vec x obtained \)

Matrix method of least squares solution

Suppose the function \ (h_ \ theta (x_1, x_2, ..., x_n) = \ theta_0x_0 + \ theta_1x_1 + ... + \ theta_ {n-1} x_ {n-1} \) matrix is expressed as:

\[h_\theta(x) = X\theta \]

Wherein, assuming the function \ (h_ \ theta (x) \) is the vector mx 1, \ (\ Theta \) vector nx 1, which has the model parameters n algebraic method. X is a matrix of dimension mxn, m represents the number of samples, n represents the number of samples wherein
the loss function is defined as \ (J (\ theta) = \ frac {1} {2} (X_ \ theta-Y) ^ T (X_ \ theta-Y) \)
wherein Y is a sample output vector, a dimension MX, \ (\ FRAC {1} {2} \) primarily for the derivation coefficient is 1, to facilitate the calculation
principle of the method of least squares, this loss of function \ (\ Theta \) vector derivation takes 0, result is shown below:

\[\frac{\partial J(\theta)}{\partial\theta} =X^T(X_\theta-Y)=0 \]

This uses the chain rule and the equation derivation two matrices Derivative

Equation. 1: \ (\ FRAC {\ partial} {\ partial X} (X-^ the TX) = 2X, X-vector \)
Equation 2: \ (\ nabla XF (AX + B) = A ^ T \ nabla_Yf, the Y = AX + B, f (Y ) is a scalar \)

Finishing the above derivation equation can be obtained:

\ [X ^ TX_ \ theta = X ^ Ty \]

Multiplying both sides while \ ((X ^ TX) ^ {- 1} \) can be obtained:

\ [\ Theta = (X ^ TX) ^ {- 1} X ^ Ty \]

Guess you like

Origin www.cnblogs.com/whiteBear/p/12614592.html