Machine learning experience from Andrew Ng (2)

WEEK 2:

This week’s talk is about using linear regression to build multiple eigenvalue models. So first, what does a linear regression model with multiple eigenvalues ​​look like.

Multiple Feature:

Multi-feature linear regression:
Expression: hθ(x)=θ0+θ1x1+θ2x2+θ3x3+⋯+θnxn
The same feature The linear regression expression is the same, θ0 represents the offset b, each x represents a feature, and θ1 to θn represent the weight of each feature. Then, transform it into the form of matrix multiplication:
Write picture description here

Gradient descent of multiple features:
Just like the gradient descent of a single feature value, we need to use gradient descent for each θ value of multiple features to converge to an optimization value, then:
Write picture description here
Formula:
Write picture description here

Exercise: Selection of Feature Scaling and Learning rate
The next two exercises teach us to process data reasonably and set parameters. So the first one, feature scaling:
If we now have two features, corresponding to two x and one J(θ), before the data is processed, these two x may be one If one is very large and the other is very small, it is very likely that during the gradient descent process, the gradient of θ1 will decrease very quickly, while the gradient of θ2 will decrease very slowly, which will cause the data to be bumpy and uneven.
So how to solve this problem? It means to control the variables θ1 and θ2 within a similar range. Generally speaking, our range is −1 ≤ x(i) ≤ 1 or −0.5 ≤ x(i) ≤ 0.5. Each x needs to be processed appropriately. The formula is as follows:
Write picture description here
Here, μi is the average of the data xi, si is the maximum value of the data xi minus the minimum value (range), and it is processed in this way The data will be relatively close to the interval given above, and will also make the value of J(θ) uniform and readable to the greatest extent.

Regarding the choice of learning rate, we have already talked about the bad effects of too large or too small a learning rate in previous courses. However, the learning rate is also different for different models, so we need to try it. Obtain a relatively appropriate learning rate a. We can try the effect of the learning rate in the form of a power of 0.1*3 every time. Such as 0.01, 0.03, 0.1, 0.3, 1, 3, 10 and so on. Through this process, we can quickly locate the appropriate learning rate.

Features and Polynomial Regression:

In polynomial regression, although polynomial regression also has multiple feature weights θ, the obvious difference from multi-feature linear regression is that polynomial regression has only one feature variable x (which can be obtained by combining multiple feature variables). Since polynomial regression is nonlinear , so it can solve some problems that linear regression cannot solve well. The expression is as follows:
Write picture description here

tips:
When using polynomial regression, it is usually necessary to use feature scaling, and the value of the x power must be consistent to obtain correct data.

Guess you like

Origin blog.csdn.net/jxsdq/article/details/78147753