[Machine Learning] Linear Regression 1-Basic Introduction


1. What is linear regression

Linear regression is a basic predictive modeling technique used to establish a linear relationship between independent and dependent variables. The model aims to find a best-fit straight line that minimizes the error between predicted and actual values. The basic assumption of the linear regression model is that there is a linear relationship between the independent variable and the dependent variable, and the error term follows a normal distribution.

Second, the basic understanding of linear regression

insert image description here

An example can be given through this form

Eigenvalue: salary and age
Goal: amount, predict the loan amount that the bank will give me
Thinking: both salary and age will affect the amount, so what are the weights of the two

  • understand
  • x1 and x2 represent the two feature values ​​of salary and age, respectively, and y represents the predicted amount
  • All linear regression has to do is find a suitable line to fit our data points
    insert image description here

Assuming that the coefficient of salary is β1 and the coefficient of age is β2 ,
the fitted surface equation is y=β1 x1+β2 x2+β0 (where β0 represents the error term).
The goal of the linear regression algorithm is to find the best fitting straight line, the straight line Minimize the sum of the distances of all data points from the regression line. This process of minimizing the distance is known as the method of least squares

3. Mathematical formula

1. Model

Suppose there are m training samples, each training sample has n independent variables and one dependent variable, the vector is expressed as

  • Independent variable matrix X=[x1,x2,x3…xn]
  • The dependent variable matrix Y=[y1,y2,y3,...ym]
    regression model assumption function is:
  • h(x)=θ0 + θ1x1 + θ2x2 + ... + θnxn
    where θ0, θ1, θ2, ..., θn are the parameters to be sought. Merge the independent variable matrix X and theta vector into one (n+1)-dimensional vector θ:
  • θ = [θ0, θ1, θ2, ..., θn]
    The spread error vector is ε, and each sample error
  • ε = h(x) - y
    The objective is to minimize the sum of squares of the error vectors:
  • J(θ) = 1/2m * ∑(h(xi) - yi)^2

2. Error analysis

Error analysis is used to analyze the error between the model prediction results and the actual results. Error analysis can help us understand the strengths and weaknesses of a model and how to improve it.

The error ε is independent and has the same distribution. It obeys the Gaussian distribution with a mean value of 0 and a variance of θ^2. It is a
popular understanding
that there is no relationship between two people who come to check the loan amount
. The same distribution
The amount given by the bank is suitable in most cases, and there are very few high and low parts, which obey the Gaussian distribution

  • Goal: Make the likelihood function as big as possible, you can use the partial derivative to make the partial derivative equal to 0
    J(θ) = 1/2m * ∑(h(xi) - yi)^2

3. Evaluation method

The evaluation methods of linear regression models mainly include cross-validation and hold-out method. Cross-validation is to divide the data set into k subsets, one of the subsets is used as the verification set, and the remaining subsets are used as the training set, the model is trained k times, and each time a different subset is used as the verification set, and finally the results of the k models are take the average. The hold-out method is to divide the data set into training set and test set, and use the test set to evaluate after training the model.

insert image description here
The closer the value of R^2 is to 1, the better the model we think

4. Gradient Descent

Gradient descent is an iterative algorithm for solving optimization problems. In linear regression, the gradient descent algorithm is used to find the parameters that minimize the cost function. The idea of ​​gradient descent is to continuously adjust the parameters so that the cost function is continuously reduced. Specifically, for the objective function J(θ), in each iteration, the gradient descent algorithm finds the direction that makes the cost function decrease the fastest by calculating the partial derivative of the cost function with respect to the parameters, that is, the opposite direction of the gradient. Then, the parameters are updated according to the gradient direction and the learning rate α until the goal of minimizing the cost function is reached. In each iteration, according to the gradient direction of the current model parameters, the value of the model parameters is adjusted until the sum of squared errors is minimized.

  • Objective:
    θj = θj - α/m * ∑(h(xi) - yi) * xi(j) ... j represents the jth parameter, m is the number of samples, α is the learning rate, xi(j) represents the ith parameter The value of the jth independent variable of the sample.
  • Procedure
    Find the endpoint of the gradient descent objective function
  • Batch Gradient Descent:
    It is easy to get the optimal solution by updating all the data each time, but the speed is very slow because all samples are considered each time
  • Stochastic gradient descent:
    Randomly use one sample data each time, find one sample each time, the iteration speed is fast, but not necessarily in the direction of convergence every time
  • Small batch gradient descent
    Update a small part of sample data each time, it is more comprehensive, it is recommended to use
    batch data: 32, 64, 128 are all available, but memory and efficiency need to be considered
  • Learning rate
    The learning rate (step size), which controls the length of the iteration, has a huge impact on the result, so it is generally smaller and starts small.

The gradient descent algorithm may fall into a local optimal solution and cannot reach the global optimal solution. Therefore, when using the gradient descent algorithm, we need to make appropriate adjustments to the learning rate and the number of iterations to avoid the algorithm from falling into a local optimal solution.

Four. Summary

The linear regression algorithm is a basic machine learning model that can be used to predict future outcomes or perform classification. In practical applications, we can use the least squares method or gradient descent algorithm to solve the model parameters and obtain the optimal prediction results. At the same time, we also need to choose appropriate models and algorithms according to specific data sets and problems to get better prediction results.

In the next section, we will write a linear regression model by ourselves, and observe the influence of single feature and multiple features on the change of gradient loss gradient

I hope you will support us a lot, study hard together, and share more novel and interesting things in the future

Guess you like

Origin blog.csdn.net/qq_61260911/article/details/129909799