Why is least squares the best method when doing linear regression?

The most fundamental reason is philosophy/logic.

The sum of squared deviations , always greater than or equal to 0. At the same time, it can be known that it is impossible for this thing to reach the maximum value (as long as it deviates enough, it must be larger and larger), so when the partial derivative is 0, the minimum value is obtained~ (the condition for taking the extreme value) Well, the partial derivative is 0)


We perform regression analysis, the independent variable is x, and the dependent variable is y, looking for the connection between y and x, more precisely how x finds y, so x and y are two essentially different quantities, one is the reason, The other is fruit. Now let's look at a topic: "We should use a straight line that makes the distance between each point and the smallest line". This approach actually mixes causality, trying to find the optimal hyperplane in the (x,y) vector space. There's no error, it's at least an unnatural logic. The logic of least squares is more natural. For example, I have a dependent variable y and two independent variables, X1, x2, which are represented by a vector in my observed sample.Why is least squares the optimal method when doing linear regression

What does least squares do? It is the closest point to the observed y vector in the linear space generated by the observation vectors X1 and x2. From a geometrical point of view, this is an orthographic projection. Many answers are not necessarily the best of least squares, and we can use other distances as well. That's fine, but the advantage of least squares is exactly what it's "natural" to. The space we are most used to is Euclidean space with inner product. If we use other distances, there is no "natural" inner product. Without this distance, the property of minimum variance (blue) disappears.Why is least squares the optimal method when doing linear regression

If there is no such distance, the noise is assumed to follow another distribution. I answered the question (why many variables can be described by a normal distribution)? People explain why they like to use the normal distribution assumption.

At a higher point, the total method of modern science is both "induction" and "deduction." From an inductive point of view, what kind of noise should be used in practical problems? What kind of distribution should be used? From a deductive point of view, which method is the most natural, the most beautiful, and the easiest to understand is to try and use that method.Why is least squares the optimal method when doing linear regression

Euclidean distance is the most natural and intuitive distance, normal distribution is the most common and easiest noise distribution to deal with, and natural least squares is the best method.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325628017&siteId=291194637