ML learning four-multivariate linear regression

4-1 Multi-function

We have discussed the univariate / feature regression model. Now we add more features to the housing price model, such as the number of rooms and floors, to form a model with multiple variables. The features in the model are (x 1, x 2, x 3, x 4 .........)

 

 

n represents the number of features

X (i)  represents the i  th training examples, wherein the matrix is the i-th  row, is a vector ( Vector ).

For example, the picture above

 

x (i) j represents the j-th feature of the i-th row in the feature matrix  , which is the j- th feature of the i-  th training instance  .

The multi-variable hypothesis h is  expressed as: There are n + 1 parameters and n variables in this formula 

If x 0 = 1 is introduced in h above , then the parameter in the model is an n + 1  -dimensional vector, any training instance is also an n + 1  -dimensional vector, and the dimension of the feature matrix is m * (n + 1) .

Therefore, the formula can be simplified to:, where the superscript represents the matrix transpose.

4-2 Multivariate gradient descent method

In multivariate linear regression, we construct a cost function, then this cost function is the sum of squares of all modeling errors

 

 

 

 The batch gradient descent algorithm for multivariate linear regression is:

 

 

 

After derivation is

 

 

 We begin to randomly select a series of parameter values, calculate all the prediction results, and then give all parameters a new value, and so on until convergence.

4-3 Feature zoom

When we face multi-dimensional feature problems, we must ensure that these features have similar scales, which will help the gradient descent algorithm to converge faster.

 

 The easiest way is to let x = x / s n  , where μ n  is the average value and s n  is the standard deviation.

4-4 Learning rate

The number of iterations required for the convergence of the gradient descent algorithm varies according to the model. We can draw a graph of the number of iterations and the cost function to observe when the algorithm tends to converge.

 

 Some methods to automatically test for convergence, such as comparing the change value of the cost function with a certain threshold (for example, 0.001), but it is usually better to look at the above chart.

Each iteration of the gradient descent algorithm is affected by the learning rate. If the learning rate is too small, the number of iterations required to achieve convergence will be very high; if the learning rate is too large, each iteration may not reduce the cost function, and may Crossing the local minimum leads to failure to converge.

 

 4-5 Feature and polynomial regression

 

Linear regression is not suitable for all data. Sometimes we need curves to adapt to our data, such as a quadratic model or a cubic model.

Transform the cubic model into a linear regression model

 

 Note: If we use a polynomial regression model, feature scaling is necessary before running the gradient descent algorithm.

4-6 normal equations

We are all using gradient descent algorithm, but for some linear regression problems, the normal equation method is a better solution.

 

 

 

 

 

 

 

 

Use normal equations to find parameters

 

 

 

 4-7 Normal equations and irreversibility (optional)

What to do if some matrices are irreversible

If you are in Octave, you can use the pseudo-inverse function pinv () to achieve.

This method of using different linear algebra libraries is called pseudo-inverse.

The result calculated with it is correct

 

Guess you like

Origin www.cnblogs.com/lmr7/p/12693228.html
Recommended