[] Machine learning machine learning portal 04 - Linear Regression

In the previous three articles, we use kNN algorithm describes the classification in machine learning.

In general, the classification (classification) and regression (regression) are two major categories of problems in machine learning. In fact, they are doing the same thing - the transformation of inputs into outputs. The difference is that the output of the classification is obtained discrete values, such as benign (0) prior to the issue of cancer and malignant (1); and the obtained regression output value is continuous.

This article will begin to discuss return issues. We from the most simple linear regression begins.

1. The concept of linear regression

The so-called linear regression is a linear function of the argument is given an arbitrary value predicted for a given argument. It is very simple, and can play a very good effect in a particular issue back background.

Let's try to consider one yuan linear regression (also known as a simple linear regression).

We know that the expression of the general first-order linear function is:

   

 

 

For machine learning, data can not be entirely natural in a straight line. In this context, if the data is nearly linear distribution, whether we can be fitted with such a straight line, so to predict it? The answer of course is yes, then we need to consider is how to determine this used to fit a straight line.

In fact, it is easy to imagine, given a set of data points linear approximation, we have a myriad of lines can approach these points. So, how do we evaluate them and select the most appropriate one we think of it?

Intuitively, we want the data points are distributed as evenly as possible on both sides of the straight line. Deeper analysis, the purpose of doing so is to reduce the distance between the data points and straight as possible. Because the greater the distance, the mean error is likely to be. Therefore, we consider the use of distance between all data points with straight lines to measure the pros and cons of the regression line.

 

 

 

2. The method of least squares

2.1 loss and risk

Part 1, we construct a function   . We call this function is called linear regression of the above quadratic loss function .

What is the loss function ?

Loss function describes the degree of error between the true value and the predicted values ​​of the individual samples. Used to measure the quality of a model predictions.

Four commonly used loss function:

  1. 0-1 loss function (used for classification, the classification accuracy set to 0, taking a classification error)
  2. Quadratic loss function (i.e., the square of the difference for all the predicted value and the true value)
  3. Absolute loss function (i.e., the difference between the absolute values ​​of all prediction value and the true value)
  4. Logarithmic loss function (using the maximum likelihood estimate of thinking, see specific https://www.cnblogs.com/klchang/p/9217551.html )

Usually recorded as a function of loss for a single data point. So, from the local to the whole, it leads to the risk function concept.

Risk function, also known as the expected risk, refers to the expected loss of function.

Under specific training data set, we do not know the distribution of the data set, there is no way to calculate the desired, we can calculate only average. We called this the average experience of risk .

ERM model, known as the best model .

Probability theory and mathematical statistics tell us, you can use the sample mean to estimate the approximate distribution of expectations. However, small sample size, there may be over-fitting situation. Therefore, we add a regularization term (of penalty) after the experience of risk, known as structural risk . Experience with risk risk structure in place, can be well avoided had problems fitting.

Next, we can officially introduce the least squares method.

2.2 method of least squares

The so-called least squares method, is a required  minimum method. (Squared squares of Italy)

 

 

Linear regression, we want to make the most hours on the type of values ​​a and b are calculated by the least squares method.

A and b of the partial derivatives, that is, the zero point on the extremal partial derivative, i.e. the optimal value of the possible values ​​that we need a and b. The final results are as follows:

 

 

 DETAILED derivation is more complex, not expand herein, refer https://www.cnblogs.com/code-juggler/p/8406449.html

 

3. code implementation

In the second part, we have an optimum value has been derived simple linear regression coefficients by the least square method. As a result, our code implementation is very simple.

 1 x_mean = np.mean(x)
 2 y_mean = np.mean(y)
 3 
 4 num = 0.0
 5 d = 0.0
 6 for x_i,y_i in zip(x,y):  
 7     num = num + (x_i - x_mean) * (y_i - y_mean)
 8     d = d + (x_i - x_mean) ** 2
 9 a = num / d
10 b = y_mean - a * x_mean

We use a loop, the calculated values ​​of a and b. Thus, for any value of x meet the requirements, we can use y = ax + b y value of regression forecast.

However, we have done the above operation, in fact, can be expressed as the number of product of two vectors. Calculates a product number, can greatly reduce the computation time. Thus, we can construct a vector, then use the dot numPy library functions to optimize our calculation process.

Although the number of product calculation in our view, the direct use of cyclic addition and vector arithmetic seems to be exactly the same process, but in fact, the number of product operation cycle is not the case we thought the sum, but to use the characteristic matrix optimization operations. Thus, the efficiency will be much higher than our direct construction of the for loop (especially when particularly large data sets).

 

4. Multiple Linear Regression

Mentioned above each data point is only one input variable linear regression, which is a linear regression.

In fact, when there are multiple input variables (such as time of discovery of cancer, cancer of the size), we can also perform linear regression. This is the multiple linear regression .

In order to form simple, we directly use the matrix notation to represent the data set. Matrix X- B represents the input data set, each row vector represents a data point for each index, each index value of a column vector represents all data points. Vectors [theta] = ([theta] . 1 , [theta] 2 , [theta] . 3 , ..., [theta] n- ) represent the coefficients we need the linear regression function. We construct a linear regression function = the y- [theta] · the X-  process is in fact found the risk function | ( the Y--[theta] · the X- b ) T · ( [theta] · the X- b ) | smallest vector [theta] .

The results are as follows:

 

We do not need to know its derivation. This result directly, we can get the regression equation we need.

Guess you like

Origin www.cnblogs.com/DrChuan/p/11926904.html