[Ch04-01] to solve the problem by least squares linear regression

Series blog, the original author maintained on GitHub: https://aka.ms/beginnerAI ,
click on the star with a star do not mean more stars the author harder.

4.1 method of least squares

4.1.1 History

Least squares method, also known as the least squares method (Least Square), which matches the sum of squares by minimizing the error of the data to find the best function. Using the least squares method can be easily calculated unknown data, and such that the minimum squared error between the determined data and the actual data to these. Least squares curve fitting may also be used. Other optimization problem can be expressed by the least square method or minimizing energy.

In 1801, Italian astronomer Giuseppe Piazzi Pratt discovered the first asteroid Ceres. After 40 days of follow-up observations, as Ceres to run behind the sun, making Piazzi lost the position of Ceres. Then scientists around the world use observations of Piazzi start looking for Ceres, but to find the Ceres based on the results of the calculation that most people have no results. When he was 24-year-old Goss also calculated the orbit of Ceres. Austrian astronomer Heinrich Olbers calculated according to Gauss rediscovered the track Ceres.

Gaussian least squares method used in 1809 published his book "celestial movement theory". French scientists Legendre in 1806, independent inventors "least squares", but is not known to the world and unsung. Legendre has worked with whom Gauss least square method was first founded dispute.

In 1829, the Gaussian least squares method provides an optimized effect is stronger than other methods of proof, it is called Gauss - Markov theorem.

Principia Mathematica 4.1.2

Linear regression trying to learn:

\[z(x_i)=w \cdot x_i+b \tag{1}\]

Such that:

\ [Z (x_i) \ simeq y_i \ tag {2} \]

Wherein, \ (x_i \) is the sample value characteristic, \ (y_i \) is the sample value tag, \ (z_i \) is the model predictions.

W and b learn how to do? Mean square error (MSE - mean squared error) is common means regression tasks:
\ [J = \ sum_ {I}. 1 ^ m = (Z (x_i) -y_i) ^ 2 = \ sum_ {I} ^ = m. 1 (y_i-wx_i-b) ^ 2 \ tag {3} \]

\ (J \) is called loss function. In fact, trying to find a straight line, so that all the samples to the minimum residual sum of squares on the line.

4-3 are the principles of variance to assess the function of the chart

In Figure 4-3, sample points is circular point, the current straight line fitting results. As shown on the left, we have to calculate the distance from the sample point to the vertical line, need to seek to the foot of the slope of the straight line and then calculates the distance thus calculated is slow; but in practice, we usually use in the engineering of is the right way, i.e. sample point to the vertical distance from the line, because it is very easy to calculate, with a subtraction on it.

Suppose we calculate the preliminary results are shown in dotted line, this line is appropriate it? Let's calculate the distance of points of this line, the values ​​of these distances are added up (are positive, there is no problem to cancel one another) become error.

Because several points on the image above is not a straight line, so there is not a straight line can pass through them simultaneously. So, we can only think of ways to constantly change the angle and position of the red line, so that the overall error is minimized (0 can not be used), it means that the smallest overall deviation, then the final piece of a straight line is the result we want.

If you want the minimum value of the error, by w and b derivative, then make the derivative is zero (reaches a minimum threshold), w is the optimal solution and b.

Derived as follows:

\[ \begin{aligned} {\partial{J} \over \partial{w}} &={\partial{(\sum_{i=1}^m(y_i-wx_i-b)^2)} \over \partial{w}} \\ &= 2\sum_{i=1}^m(y_i-wx_i-b)(-x_i) \end{aligned} \tag{4} \]

So Equation 4 to 0:

\[ \sum_{i=1}^m(y_i-wx_i-b)x_i=0 \tag{5} \]

\[ \begin{aligned} {\partial{J} \over \partial{b}} &={\partial{(\sum_{i=1}^m(y_i-wx_i-b)^2)} \over \partial{b}} \\ &=2\sum_{i=1}^m(y_i-wx_i-b)(-1) \end{aligned} \tag{6} \]

So Equation 6 to 0:

\[ \sum_{i=1}^m(y_i-wx_i-b)=0 \tag{7} \]

7 obtained by the formula (assuming that there are m samples):

\[ \sum_{i=1}^m b = m \cdot b = \sum_{i=1}^m{y_i} - w\sum_{i=1}^m{x_i} \tag{8} \]

Both sides of the divided m:

\[ b = {1 \over m}(\sum_{i=1}^m{y_i} - w\sum_{i=1}^m{x_i})=\bar y-w \bar x \tag{9} \]

among them:

\[ \bar y = {1 \over m}\sum_{i=1}^m y_i, \bar x={1 \over m}\sum_{i=1}^m x_i \tag{10} \]

Masaru official 10 substituting official 5:

\[ \sum_{i=1}^m(y_i-wx_i-\bar y + w \bar x)x_i=0 \]

\[ \sum_{i=1}^m(x_i y_i-wx^2_i-x_i \bar y + w \bar x x_i)=0 \]

\[ \sum_{i=1}^m(x_iy_i-x_i \bar y)-w\sum_{i=1}^m(x^2_i - \bar x x_i) = 0 \]

\[ w = {\sum_{i=1}^m(x_iy_i-x_i \bar y) \over \sum_{i=1}^m(x^2_i - \bar x x_i)} \tag{11} \]

Masaru official 10 substituting official 11:

\[ w={\sum_{i=1}^m (x_i \cdot y_i) - \sum_{i=1}^m x_i \cdot {1 \over m} \sum_{i=1}^m y_i \over \sum_{i=1}^m x^2_i - \sum_{i=1}^m x_i \cdot {1 \over m}\sum_{i=1}^m x_i} \tag{12} \]

Multiplying the numerator and denominator are m:

\[ w={m\sum_{i=1}^m x_i y_i - \sum_{i=1}^m x_i \sum_{i=1}^m y_i \over m\sum_{i=1}^m x^2_i - (\sum_{i=1}^m x_i)^2} \tag{13} \]

\[ b=\frac{1}{m}\sum_{i=1}^m(y_i-wx_i) \tag{14} \]

In fact, there are many variants of formula 13, we will see a different version in a different article, often confused, such as the following two formulas is the right solution:

\[ w = {\sum_{i=1}^m y_i(x_i-\bar x) \over \sum_{i=1}^m x^2_i - (\sum_{i=1}^m x_i)^2/m} \tag{15} \]

\[ w = {\sum_{i=1}^m x_i(y_i-\bar y) \over \sum_{i=1}^m x^2_i - \bar x \sum_{i=1}^m x_i} \tag{16} \]

Above two equations, if substituted into the formula 10, and formula should be able to get the same answer 13, but requires some arithmetic skills. For example, many people do not know the magic formula:

\[ \begin{aligned} \sum_{i=1}^m (x_i \bar y) &= \bar y \sum_{i=1}^m x_i =\frac{1}{m}(\sum_{i=1}^m y_i) (\sum_{i=1}^m x_i) \\ &=\frac{1}{m}(\sum_{i=1}^m x_i) (\sum_{i=1}^m y_i)= \bar x \sum_{i=1}^m y_i \\ &=\sum_{i=1}^m (y_i \bar x) \end{aligned} \tag{17} \]

4.1.3 code implementation

We use the following Python code to implement the above calculations about the process:

Calculating the value of w

# 根据公式15
def method1(X,Y,m):
    x_mean = X.mean()
    p = sum(Y*(X-x_mean))
    q = sum(X*X) - sum(X)*sum(X)/m
    w = p/q
    return w

# 根据公式16
def method2(X,Y,m):
    x_mean = X.mean()
    y_mean = Y.mean()
    p = sum(X*(Y-y_mean))
    q = sum(X*X) - x_mean*sum(X)
    w = p/q
    return w

# 根据公式13
def method3(X,Y,m):
    p = m*sum(X*Y) - sum(X)*sum(Y)
    q = m*sum(X*X) - sum(X)*sum(X)
    w = p/q
    return w

Because of helper libraries, we do not need to manually calculate the sum (), mean () such basic functions.

Calculated value b

# 根据公式14
def calculate_b_1(X,Y,w,m):
    b = sum(Y-w*X)/m
    return b

# 根据公式9
def calculate_b_2(X,Y,w):
    b = Y.mean() - w * X.mean()
    return b

4.1.4 operation result

Several methods used, the results are consistent with the final, and can function as a cross validation:

w1=2.056827, b1=2.965434
w2=2.056827, b2=2.965434
w3=2.056827, b3=2.965434

Code location

ch04, Level1

Guess you like

Origin www.cnblogs.com/woodyh5/p/11981579.html