The method of fitting a straight line equation coefficients seek

Disclaimer: This article is a blogger original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
This link: https: //blog.csdn.net/MarsJohn/article/details/54911788
 in the statistical analysis of the data, that is data correlation between the variables studied between x and Y are very important, by the Cartesian coordinate system Scatter the way we do find a lot of statistical data is approximately a straight line, or a positive correlation or negative correlation between them. Although these data are discrete, not continuous, we can not get to describe this correlation function equation a definite, but since the distribution of the data in the Cartesian coordinate system close to a straight line, then we can get a way to draw a straight line through approximate linear equation describes the relationship. Of course, not difficult to see from the foregoing description, all data located in the vicinity of a straight line, so this straight line can be drawn a lot of pieces, and we want to find out which one can best reflect the relationship between variables. In other words, we want to find a straight line, so that this line "closest" known data points, set this straight line equation is:


Here it is to distinguish the actual value Y y (this is the actual value of the true value statistics, observations we call), when the value of x (i = 1,2,3 ...... n), Y is obsd approximated as (or is the corresponding ordinate).

Of formula wherein Y is called a regression line equation to x, b is called the regression coefficient. To determine the linear regression equation, we need only determine the regression coefficients a and b can be.

 Set x, Y values of a set of observations as:
   I = n-l, 2,3 ......

Linear regression equation is:


When the value of x (i = 1,2,3 ...... n), observed value Y, the difference between the degree of deviation depicts the corresponding points on the ordinate and the actual value of the regression line was observed, as shown below:


 In fact, we hope that these n from the difference between the total deviation as small as possible consisted of only so can the linear closest known point. In other words, the process we find the regression line equation is actually the process of seeking the minimum deviation.

A natural idea is to add up the individual deviation as a total deviation. However, due to the deviation of positive and negative, direct sum will cancel each other out, so it can not reflect the closeness of these data, that is, the total deviation can not be used to represent the deviation of n sum, as shown below:

 
General practice is that we use the square and deviation, namely:


As the total deviation, minimum allowed. Such linear regression line is all in Q takes the minimum value that one. Since the square-squares known, so this makes "is the smallest sum of squares" approach, called the least squares method.
Seeking regression line equation a, b by the least square method using the following formula:


And where ,, is the mean, a, b of the above plus "︿" indicates that the observed values ​​are estimates obtained by the least squares method, a, b determined after the regression line equation also established.

Of course, we certainly can not be satisfied directly from the formula, we can only understand how this formula come to remember it, make good use of it, thus giving the derivation of the above two formulas is more important. Before giving the derivation of the above equation, we first give two key deformation derivation formulas used in the derivation. First, a first formula:
 
 
followed by a second formula:

 
 The basic formula of deformation is ready, we can begin to seek least-squares regression line derived formula of the equation:


 
 At this point, the end of the deformed portion of the equation, we can see from the final formula after two


Nothing to do with a, b, belonging to the constant term, we need


To obtain the minimum Q value, therefore:


So far, the formula derivation is completed.
 
Least squares linear regression equation can be used to find all of the data distribution statistics approximate line, analyze problems, it is very easy to use the program, part of the basis of statistical analysis algorithms must be able to master the application.
----------------
Disclaimer: This article is the original article CSDN bloggers "Neo-T", following the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source and link this statement.
Original link: https: //blog.csdn.net/marsjohn/article/details/54911788

Guess you like

Origin www.cnblogs.com/hjj-fighting/p/11934861.html