Machine Learning Algorithms --- Linear Regression

An introduction to the linear regression algorithm

   Linear regression is a statistical analysis method that uses regression analysis in mathematical statistics to determine the interdependent quantitative relationship between two or more variables, and is widely used. Its expression is in the form of y = w'x+e, where e is a normal distribution with a mean of 0.

  In regression analysis, only one independent variable and one dependent variable are included, and the relationship between the two can be approximated by a straight line. This regression analysis is called univariate linear regression analysis. If the regression analysis includes two or more independent variables, and there is a linear relationship between the dependent variable and the independent variable, it is called multiple linear regression analysis.

  This article mainly introduces the deductive derivation of the linear regression algorithm. For a detailed introduction to linear regression, please refer to the introduction of linear regression in Baidu Encyclopedia .

  The linear regression algorithm is a fundamental algorithm in machine learning, so it is best for readers who want to learn machine learning to fully understand the algorithm.

Second, the deductive derivation of the linear regression algorithm

  Suppose, the limit of applying for a bank card in a bank is related to the following two parameters, namely age and salary. The information of an applicant is as shown in the figure below, so how can one predict the limit of a person who can apply for a credit card by knowing his age and salary? ?

  For a linear relationship, we use y=ax+b, but in this relationship, y is only affected by one x, and the relationship between the two can be approximated by a straight line. This relationship is also called univariate linear regression. In this example, if the quota is h, and the salary and age are x1 and x2, respectively, it can be expressed as the following formula, . In this relationship, the results are affected by multiple variables, which is called multiple linear regression analysis.

  We express θ and x in the above formula as two one-dimensional matrices [θ0 θ1 θ2] and [x0 x1 x2] respectively, then the above formula can be expressed as (let x0=1).

  However, the actual results cannot be completely consistent with our calculation results, so there must be errors between the two. Suppose that for the i-th sample, there is the following relationship, , where is the true error.

  The errors are independent and have the same distribution (usually thought of as a Gaussian with mean 0 ).

  So the following formula can be obtained:

            

  Then, if there are a large number of samples, we can do parameter estimation about θ by and ,

  Find the likelihood function as follows:

          

  Take the logarithm of the above equation:

        

  Taking the derivative of the above formula and making it 0, the maximum likelihood estimate of θ can be obtained.

  In the above formula, the two marked parts are constants, the former part is zero after derivation, and the latter part is a factor, which will not affect the final result. So, for the final result, just let the unmarked part be differentiated to 0. So make:

        

  Simplify the above formula and take the partial derivative with respect to θ:

        

  By setting the result of the derivation to 0, the maximum likelihood estimate of θ (the least squares method) can be obtained,

        

  After obtaining θ, we train a linear regression model through the samples, and can use the data for which the result is unknown to make predictions.

  PS: The reader only needs to understand the derivation process of the modified algorithm. For the calculation of data, programming can be solved without manual calculation (for multi-dimensional matrices, the amount of calculation is quite large, and it is easy to miscalculate ( ̄▽ ̄)").

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325124574&siteId=291194637