CALCULATIONS least squares regression analysis

  • Least squares linear regression analysis of the statistics is the most widely used and the most common method. This blog is mainly about the process of least squares estimation in linear regression in the course of estimating process multiple linear regression and linear regression are similar.
  1. First , talk about what is regression analysis : regression analysis is variable (the dependent variable, independent variable) causal analysis of mathematical statistics. When the independent variable and the dependent variable is indeed a relationship exists, it makes sense we build the regression equation. Therefore, factors independent variables on the dependent variable to predict whether the value? How relevance, and the judgment of this correlation degree of certainty how big the problem has become regression analysis must be addressed.
  2. When the degree of correlation analysis, the correlation between the general requirements, a size (Pearson coefficient R, in the range [1,1]) to determine the degree of correlation coefficients associated with the independent and dependent variables.
  • There regression analysis Mentioned above is an important step regression equation, since it is the regression equation, then there must be an intercept b, and the regression coefficients a (refer to a linear regression: Y = aX + b) determines, for sentences words, as long as we put a, b find out the regression equation can be written out. So how to find a and b? The method used to find? The method used and what kind of conditions to achieve the regression equation was established to better describe the relationship between these two variables?

  • All least squares method is to first look at the definition: the least squares method (also known as the least square method) is a mathematical optimization technique, and it looks for the data matching by minimizing the square error function best. Using the least squares method can be easily calculated unknown data, and such that the minimum squared error between the determined data and the actual data to these.

  • As shown below, points represent the real values, the broken line between the regression equation represents established, the true value dashed red line representative of an error (residual), i.e., error = true value - error value. Here Insert Picture Description
    Then the least square method in accordance with the requirements: to minimize square error. We can be considered as a side length of a square error, and determining the area of a square and the square error is minimized, as shown in FIG.
    Here Insert Picture Description
    And all the area of the pattern that is: (Y1 -Y1 true value prediction value) ^ 2 + (Y2 -Y2 true value prediction value) ^ 2 + ...... + (Yn -Yn true value prediction value) ^ 2 and the minimum .
    We mathematical equation is expressed: Here Insert Picture Description
    the function formula of the open z:
    ! [Inserted here described image] (https://img-blog.csdnimg.cn/20181219100008797.gif) FIG little above, may be exaggerated look!
    We then:
    Here Insert Picture Description
    into the above equation to give the following simple equation:
    Here Insert Picture Description
    Next is the simple equation of a and b, respectively, taking partial turned, so that the partial derivatives equal to 0 and is as follows:
    Here Insert Picture Description
    Here Insert Picture Description
    Finally, after dividing both sides of formula 2n finishing, available find a, b of the equation:
    Here Insert Picture Description
    that is to compute the regression coefficient b, and the intercept of the regression equation by the least square method a derivation process, but as a professional blogger statistically speaking, think it is just a way to calculate the regression equation, the most important thing after the step had regression analysis, we also said earlier, to get such a return equation, how it's fitting degree? Is not there a better way to find the regression model?
    Then what statistics to judge the fit is good or bad? We generally use the R ^ 2,
    give conclusions: R ^ 2 = SSR / SST , R ^ 2 value of between 0 and 1, the closer to 1 the better the fit. (
    SSR on behalf of regression sum of squares: Here Insert Picture Description
    SST represents the sum of squared deviations: Here Insert Picture Description
    )
    There is also a table used to SSE formula 2 ^ R & lt:
    R & lt ^ = 2. 1-SSE / the SST,
    which is composed of: SST (total deviation) = SSR + SSE and R ^ 2 = SSR / SSTS conversion from.
    If all points are true value on the regression line, indicating that SSE is 0, then R ^ 2 is equal to 1,
    a change of 100% means that Y caused by changes in X and Y do not affect other factors, can fully explain the change in the regression line of Y . If R ^ 2 is very low, indicating that there may not be a linear relationship between X and Y

  • Excluding variable
    if the yuan, that type of multiple independent variables, some variables on the dependent variable interpretation of the low, we can eliminate the class variables, so that the simple regression model. So this step on the need for variable significance test . The idea of variable significance test: the use of mathematical statistics course will learn statistics, involving theoretical knowledge too deep, do not speak here. We note the conclusions like:

  • T test
    T test for a (single) variable Xi from the linear Y significance, if the Xi is not significant, the variable means can be removed from the mold.

  • F test
    F-test was used for X viewed from all the independent variables of the linear Y overall significance.

T test to see that the P-value statistical value results, F test to see that the Significant F value of the statistical results, these values ​​are generally compared with the significance level, less than the significance level of said clear with, of course, the smaller the significant (level of significance is set manually, there are two commonly used significance level of 0.05 and 0.01, respectively, formulas).

Released five original articles · won praise 16 · views 10000 +

Guess you like

Origin blog.csdn.net/data_bug/article/details/85072615