Wu Enda study notes-four, multivariate linear regression

Four, multivariate linear regression

4.1 Multi-dimensional features and multi-variable gradient descent

Insert picture description here

4.2 Gradient Descent Method Practice 1-Feature Scaling

Ensure that these features have similar scales , which will help the gradient descent algorithm to converge faster.

I think it is actually regularization, and the data is controlled between [-1, 1], so that the data gap is relatively reduced a lot

Of course, this method is used when the range of different eigenvalues ​​is very different.

eg: size = (0, 2000) ==> size = (size - 1000) / 2000 ==> size = (-1, 1)

x = (x-mean) / standard deviation

# 特征缩放
def normalize_feature(df):
    """Applies function along input axis(default 0) of DataFrame."""
    return df.apply(lambda column: (column - column.mean()) / column.std())

4.3 Gradient descent method practice 2-learning rate

How to choose the learning rate

Look at the graph of gradient descent. In
Insert picture description herethis picture, it is obvious that the green line represents a better learning rate.

4.4 Features and polynomial regression

In fact, it is the process of getting different regression equations by sorting and replacing the parameters separately.

Such as housing price prediction problems, the length and width of the house can be integrated into the area to facilitate the fitting calculation

h θ ( x ) = θ 0 + θ 1 × f r o n t a g e + θ 2 × d e p t h h_{\theta}\left( x \right)={\theta_{0}}+{\theta_{1}}\times{frontage}+{\theta_{2}}\times{depth} hi(x)=θ0+θ1×frontage+θ2×depth

x 1 = f r o n t a g e {x_{1}}=frontage x1=f r o n t a g e (frontage width),x 2 = depth {x_{2}}=depthx2=d e p t h (vertical depth),x = frontage ∗ depth = areax=frontage*depth=areax=frontagedepth=a r e a (area), then:h θ (x) = θ 0 + θ 1 x {h_{\theta}}\left( x \right)={\theta_{0}}+{\theta_{1 }}xhi(x)=θ0+θ1x

4.5 Normal equation

θ j : = θ j − α [ 1 m ∑ i = 1 m h θ ( x ( i ) ) − y ( i ) ) x j ( i ) + λ m θ j ] {\theta_{j}}:={\theta_{j}}-\alpha[\frac{1}{m}\sum\limits_{i = 1}^{m}h_\theta(x^{(i)}) - y^{(i)})x^{(i)}_j+\frac{\lambda}{m}\theta_j] θj:=θjα [m1i=1mhi(x(i))Y(i))xj(i)+mlθj]

Gradient descent Normal equation
Need to choose the learning rate α Not needed
Need multiple iterations One calculation
It can also be better applied when the number of features is large. Need to calculate (XTX) − 1 (X^TX)^(-1)(XTX)1 If the number of features n is large, the calculation cost is high, because the calculation time complexity of the matrix inverse isO (n 3) O(n^3)O ( n3 )Generally speaking, it is acceptable when n is less than 10000
Suitable for all types of models Only suitable for linear models, not suitable for other models such as logistic regression models

Guess you like

Origin blog.csdn.net/qq_44082148/article/details/104347444