Use linear regression to find the best fitting line (with practical code python)


1. Find a straight line that fits the data, known as linear regression.

        To find, the best fit line: (take data with only one feature as an example)

                                        

        Use least squares estimation to find the best parameters ,

        (Why use the least squares estimation, that is, the sum of the squares of the differences, instead of the absolute value or the fourth power as the cost function, see the probability explanation of the least squares method )

        (In fact, the least squares estimation is the result of modeling the Gaussian distribution in the generalized linear model)

Define the cost function:

                                        

                                                                                        matrix form

    Easy to understand:

When the cost function value reaches the minimum, it is the best estimated parameter for             the regression coefficient of the fitted line . It's called an estimate because the data only stays on the training set.

    Solve the extremum method of the cost function:

        (1). Differential derivation method (here is matrix derivation) (method used in this article)

        (2). Optimization algorithm, gradient descent algorithm


                To understand method 1, the independent variable is derived in the cost function. Since it is a matrix, the matrix should be derived.

                    Let the derivative function be 0, and the resulting formula is: ( see the process for details )

                                                

                    That is:                        

    Finally, the regression coefficient matrix is ​​obtained to obtain a fitted straight line (which can be extended to a plane or even higher dimensions)

2. Next, use python to perform the best fitting straight line (code).

        Fitting result graph:

                    

            From the results, a problem with linear regression can be underfitting because it is an unbiased estimator with minimum mean squared error. So some methods allow to introduce some bias in the estimation to further reduce the mean squared error. One such method is locally weighted linear regression .


3. Consider a problem, when the data features are more than the training set sample points, that is to say irreversible. Because  .

        At this time, it is necessary to use reduced samples to "understand" the data and obtain the regression coefficient matrix.



Attached:

    Actual combat: Linear regression can be used to predict the age of abalone

  Sample Code: Predicting Abalone Age



Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325615439&siteId=291194637