Machine Learning notes: Linear regression - least squares method Summary

table of Contents

1 Introduction least squares method

2 least squares history

3 basic forms

4 a linear regression

More than 5 yuan linear regression

Reference material


 

1 Introduction least squares method

The least square method (Least square method, also known as the least squares method) is a mathematical optimization techniques. It squares matching the data to minimize the error and find the best matching function. Using the least squares method can be easily calculated unknown data, and such that the minimum squared error between the determined data and the actual data to these. Least squares curve fitting may also be used. Some other optimization problems can be expressed by the least squares method by minimizing or maximizing entropy energy. (From Baidu Encyclopedia)

Note: The least squares method is widely used, is not limited to linear regression

 


2 least squares history

In 1801, Italian astronomer Giuseppe Piazzi Pratt discovered the first asteroid Ceres. After 40 days of follow-up observations, as Ceres to run behind the sun, making Piazzi lost the position of Ceres. Then scientists around the world use observations of Piazzi start looking for Ceres, but to find the Ceres based on the results of the calculation that most people have no results. When he was 24-year-old Goss also calculated the orbit of Ceres. Austrian astronomer Heinrich Olbers calculated according to Gauss rediscovered the track Ceres.

Gaussian least squares method used in 1809 published his book "celestial movement theory".

French scientists Legendre in 1806, independent inventors "least squares", but is not known to the world and unsung.

Legendre has worked with whom Gauss least square method was first founded dispute.

In 1829, the Gaussian least squares method provides an optimized effect is stronger than other methods of proof, it is called Gauss - Markov theorem.

Least squares method is not only the most important 19th-century statistical methods, but also can be called the soul of mathematical statistics. Correlation and regression analysis, several major branches of mathematical statistics analysis of variance and linear model theory are the theoretical basis to the least squares method. As the American statistician Stigler (SM Stigler) said, "the least squares method in mathematical statistics like Calculus in mathematics." It is the most basic method of least squares regression method parameters have to study so the least square method and its application for statistical study has important significance. (From Baidu Encyclopedia)

 


3 basic forms

Given the exemplary attribute description  , wherein   is  now for the first  value of the attribute, a linear model to predict the learning function to a linear combination of properties by, i.e., d {\bf{x}} = ({x_1};{x_2}; \ldots ;{x_d}){x_i}{\bf{x}} i 

                                                                   f({\bf{x}}) = {\omega _1}{x_1} + {\omega _2}{x_2} + \ldots + {\omega _d}{x_d} + b

Generally represented by a vector more concise form:

                                                                                     f({\bf{x}}) = {{\bf{\omega }}^{\rm{T}}}{\bf{x}} + b

among them,{\bf{\omega }} = ({\omega _1};{\omega _2}; \ldots ;{\omega _d})

Can be very simple to understand, as shown by a number of points is a straight line, the equation of the line if not known in advance, and only a few points, then according to these points is determined as a function of this line. Our task is solved according to some known data obtained function. Of course, this is just a linear regression, and multiple linear regression similar.

Regression analysis, if only one independent variable and comprising a dependent variable, and the relationship between the two approximated straight line is available, this is called regression analysis, a linear regression analysis. If the regression analysis comprises two or more independent variables, and because a linear relationship between the dependent and independent variables, referred to as the multiple linear regression analysis. A linear two-dimensional space is a straight line; linear space is a three-dimensional plane, for the multidimensional space is a linear hyperplane.

 


4 a linear regression

Given data set:

                                                          D = \{ ({x_1},{y_1}),({x_2},{y_2}), \ldots ,({x_m},{y_m})\} = \{ ({x_i},{y_i})\} _{i = 1}^m

Among them, {x_i} \in {\Cal R} and  {y_i} \in {\Cal R}.

The linear regression view to learn:

                                                                          f({x_i}) = \omega {x_i} + bSo thatf({x_i}) \approx {y_i}

In fact, it is to find a function that allows data as possible in this function. So how to determine the parameters  \omega and then? b 

Obviously, the key is how to measure  f({x_i}) the  {y_i} gap between, we use the mean square error to measure. Therefore, we will issue a minimum gap into a mean-square error is minimized , namely:

                                                    ({\omega ^ * },{b^ * }) = \mathop {\arg \min }\limits_{(\omega ,b)} \sum\limits_{i = 1}^m {{{\left( {f({x_i}) - {y_i}} \right)}^2}} = \mathop {\arg \min }\limits_{(\omega ,b)} \sum\limits_{i = 1}^m {{{\left( {{y_i} - \omega {x_i} - b} \right)}^2}}

Wherein, {\omega ^ * },{b^ * }respectively  \omega , and  b solutions.

Geometric mean square error sense is commonly used Euclidean geometry distance, referred to as the Euclidean distance (Euclidean distance). Based on minimizing the mean square error model was solved by a method known as the least squares method . Linear regression, the least squares method is to try to find a straight line, so that all the samples to the linear Euclidean distance of minimum.

Solving  \omega and so    minimizing process called least squares linear regression model "parameter estimation" (parameter estimation). In order to get the minimum, we only need to function   derivation can be. Are  respectively  and derivative, to give: b {E_{(\omega ,b)}} = \sum\limits_{i = 1}^m {{{\left( {{y_i} - \omega {x_i} - b} \right)}^2}}{E _ {(\ omega, b)}}{E _ {(\ omega, b)}}\omega b 

                                                                    \frac{{\partial {E_{(\omega ,b)}}}}{{\partial \omega }} = 2\left( {w\sum\limits_{i = 1}^m {x_i^2 - \sum\limits_{i = 1}^m {\left( {{y_i} - b} \right){x_i}} } } \right)

                                                                          \frac{{\partial {E_{(\omega ,b)}}}}{{\partial \omega }} = 2\left( {mb - \sum\limits_{i = 1}^m {\left( {{y_i} - \omega {x_i}} \right)} } \right)

Solutions are obtained reciprocal zero, is the optimal solution, the order derivative  \frac{{\partial {E_{(\omega ,b)}}}}{{\partial \omega }} = 0, \frac{{\partial {E_{(\omega ,b)}}}}{{\partial \omega }} = 0 is obtained  \omega , and the optimal solution are: b 

                                                                             \omega = \frac{{\sum\limits_{i = 1}^m {{y_i}({x_i} - \bar x)} }}{{\sum\limits_{i = 1}^m {x_i^2 - \frac{1}{m}{{\left( {\sum\limits_{i = 1}^m {{x_i}} } \right)}^2}} }}

                                                                                  b = \frac{1}{m}\sum\limits_{i = 1}^m {({y_i} - \omega {x_i})}

Among them, the mean   . x \bar x = \sum\limits_{i = 1}^m {{x_i}}

 


More than 5 yuan linear regression

It is a more general case multiple linear regression analysis of data sets:

                                                        D = \{ ({{\bf{x}}_1},{y_1}),({{\bf{x}}_2},{y_2}), \ldots ,({{\bf{x}}_m},{y_m})\} = \{ ({{\bf{x}}_i},{y_i})\} _{i = 1}^m

Among them, {{\bf{x}}_i} \in {\Cal R} and  {y_i} \in {\Cal R}.

At this point we are trying to learn:

                                                                         f({{\bf{x}}_i}) = {{\bf{\omega }}^{\text{T}}}{{\bf{x}}_i} + bSo thatf({{\bf{x}}_i}) \approx {y_i}

This is called multiple linear regression (multivariate linear regression).

Similarly, the method may be estimated using the least squares method on. For ease of discussion, we will  \omega and inhalation vector form . b {\bf{\hat \omega }} = \left( {{\bf{\omega }};b} \right)

Accordingly, the data set  D = \{ ({{\bf{x}}_1},{y_1}),({{\bf{x}}_2},{y_2}), \ldots ,({{\bf{x}}_m},{y_m})\} = \{ ({{\bf{x}}_i},{y_i})\} _{i = 1}^m  is represented as a  m \times \left( {d + 1} \right) dimensional size of the matrix  {\bf{X}}, where each row corresponds to one example, the front row  d elements corresponds to an example of the attribute values, the last element of a constant set to 1, namely: d 

                                                           {\bf{X}} = \left( {\begin{array}{*{20}{c}} {{x_{11}}}&{{x_{12}}}& \cdots &{{x_{1d}}}&1\\ {{x_{21}}}&{{x_{22}}}& \cdots &{{x_{2d}}}&1\\ \vdots & \vdots & \ddots & \vdots & \vdots \\ {{x_{m1}}}&{{x_{m1}}}& \cdots &{{x_{md}}}&1 \end{array}} \right) = \left( {\begin{array}{*{20}{c}} {x_1^{\rm{T}}}&1\\ {x_2^{\rm{T}}}&1\\ \vdots & \vdots \\ {x_m^{\rm{T}}}&1 \end{array}} \right)

Then also mark vector form  {\bf{y}} = \left( {{y_1};{y_2}; \ldots ;{y_m}} \right), Similarly, the mean square error is minimized, namely:

                                                                      {{\bf{\hat \omega }}^ * } = \mathop {\arg \min }\limits_{{\bf{\hat \omega }}} {\left( {{\bf{y}} - {\bf{X\hat \omega }}} \right)^{\rm{T}}}\left( {{\bf{y}} - {\bf{X\hat \omega }}} \right)

So  {E_{{\bf{\hat \omega }}}} = {\left( {y - {\bf{X\hat \omega }}} \right)^{\rm{T}}}\left( {y - {\bf{X\hat \omega }}} \right), in order to get the minimum, we only need to function  {E_{{\bf{\hat \omega }}}} derivation can be. The  {E_{{\bf{\hat \omega }}}} pair of  {\bf{\hat \omega }} = \left( {{\bf{\omega }};b} \right) derivation, to give:

                                                                                 \frac{{\partial {E_{{\bf{\hat \omega }}}}}}{{\partial {\bf{\hat \omega }}}} = 2{{\bf{X}}^{\rm{T}}}\left( {{\bf{X\hat \omega }} - {\bf{y}}} \right)

Order  \frac{{\partial {E_{{\bf{\hat \omega }}}}}}{{\partial {\bf{\hat \omega }}}} = 0 to obtain the optimal solution, but relates to the derivation of the inverse matrix calculation, much more complicated than the single-variable linear regression case, where no further described herein.

 


Reference material

[1] https://www.cnblogs.com/wangkundentisy/p/7505487.html

[2] Zhou Zhihua with machine learning, Beijing: Tsinghua University Press, January 2016.

 

Guess you like

Origin blog.csdn.net/zaishuiyifangxym/article/details/93771893