table of Contents
1 Introduction least squares method
More than 5 yuan linear regression
1 Introduction least squares method
The least square method (Least square method, also known as the least squares method) is a mathematical optimization techniques. It squares matching the data to minimize the error and find the best matching function. Using the least squares method can be easily calculated unknown data, and such that the minimum squared error between the determined data and the actual data to these. Least squares curve fitting may also be used. Some other optimization problems can be expressed by the least squares method by minimizing or maximizing entropy energy. (From Baidu Encyclopedia)
Note: The least squares method is widely used, is not limited to linear regression
2 least squares history
In 1801, Italian astronomer Giuseppe Piazzi Pratt discovered the first asteroid Ceres. After 40 days of follow-up observations, as Ceres to run behind the sun, making Piazzi lost the position of Ceres. Then scientists around the world use observations of Piazzi start looking for Ceres, but to find the Ceres based on the results of the calculation that most people have no results. When he was 24-year-old Goss also calculated the orbit of Ceres. Austrian astronomer Heinrich Olbers calculated according to Gauss rediscovered the track Ceres.
Gaussian least squares method used in 1809 published his book "celestial movement theory".
French scientists Legendre in 1806, independent inventors "least squares", but is not known to the world and unsung.
Legendre has worked with whom Gauss least square method was first founded dispute.
In 1829, the Gaussian least squares method provides an optimized effect is stronger than other methods of proof, it is called Gauss - Markov theorem.
Least squares method is not only the most important 19th-century statistical methods, but also can be called the soul of mathematical statistics. Correlation and regression analysis, several major branches of mathematical statistics analysis of variance and linear model theory are the theoretical basis to the least squares method. As the American statistician Stigler (SM Stigler) said, "the least squares method in mathematical statistics like Calculus in mathematics." It is the most basic method of least squares regression method parameters have to study so the least square method and its application for statistical study has important significance. (From Baidu Encyclopedia)
3 basic forms
Given the exemplary attribute description , wherein is now for the first value of the attribute, a linear model to predict the learning function to a linear combination of properties by, i.e.,
Generally represented by a vector more concise form:
among them,
Can be very simple to understand, as shown by a number of points is a straight line, the equation of the line if not known in advance, and only a few points, then according to these points is determined as a function of this line. Our task is solved according to some known data obtained function. Of course, this is just a linear regression, and multiple linear regression similar.
Regression analysis, if only one independent variable and comprising a dependent variable, and the relationship between the two approximated straight line is available, this is called regression analysis, a linear regression analysis. If the regression analysis comprises two or more independent variables, and because a linear relationship between the dependent and independent variables, referred to as the multiple linear regression analysis. A linear two-dimensional space is a straight line; linear space is a three-dimensional plane, for the multidimensional space is a linear hyperplane.
4 a linear regression
Given data set:
Among them, and .
The linear regression view to learn:
So that
In fact, it is to find a function that allows data as possible in this function. So how to determine the parameters and then?
Obviously, the key is how to measure the gap between, we use the mean square error to measure. Therefore, we will issue a minimum gap into a mean-square error is minimized , namely:
Wherein, respectively , and solutions.
Geometric mean square error sense is commonly used Euclidean geometry distance, referred to as the Euclidean distance (Euclidean distance). Based on minimizing the mean square error model was solved by a method known as the least squares method . Linear regression, the least squares method is to try to find a straight line, so that all the samples to the linear Euclidean distance of minimum.
Solving and so minimizing process called least squares linear regression model "parameter estimation" (parameter estimation). In order to get the minimum, we only need to function derivation can be. Are respectively and derivative, to give:
Solutions are obtained reciprocal zero, is the optimal solution, the order derivative , is obtained , and the optimal solution are:
Among them, the mean .
More than 5 yuan linear regression
It is a more general case multiple linear regression analysis of data sets:
Among them, and .
At this point we are trying to learn:
So that
This is called multiple linear regression (multivariate linear regression).
Similarly, the method may be estimated using the least squares method on. For ease of discussion, we will and inhalation vector form .
Accordingly, the data set is represented as a dimensional size of the matrix , where each row corresponds to one example, the front row elements corresponds to an example of the attribute values, the last element of a constant set to 1, namely:
Then also mark vector form , Similarly, the mean square error is minimized, namely:
So , in order to get the minimum, we only need to function derivation can be. The pair of derivation, to give:
Order to obtain the optimal solution, but relates to the derivation of the inverse matrix calculation, much more complicated than the single-variable linear regression case, where no further described herein.
Reference material
[1] https://www.cnblogs.com/wangkundentisy/p/7505487.html
[2] Zhou Zhihua with machine learning, Beijing: Tsinghua University Press, January 2016.