Q1 multidimensional feature
The number of columns is the number of features of the above figure, the number of rows is the number of samples. Assume the following functions:
Where X 0 =. 1.
Q2 multivariate gradient descent
Univariate and loss of function of the same:
among them,
Iterative following derivation:
Wherein Q3 1- gradient descent scaling Practice
Variations between the characteristic dimensions vary greatly (e.g., a is 0-1000, a is 0-5), the gradient algorithm requires very multiple iterations to converge, as shown below:
Methods: Various features scaled to approximately the same scale, the easiest way is characterized by subtracting the mean divided by the variance. As follows:
Q4 gradient descent learning rate practice 2-
Small learning rate is too slow convergence, the learning rate is too large may lead to non-convergence.
Usually magnified by three times to consider setting the learning rate, such as: 0.01,0.03,0.1,0.3,1,3,10 .......
Q5 features and polynomial regression
For example, a quadratic model:
Or three models:
By creating new features (and even if):
Thereby transforming the model into a linear model.
Q6 normal equation
Premise: For some linear regression problem, the use of formal equations are solved in one step (derivative is zero equation solver). As follows
Direct cause
。
Solution of arguments directly:
(X contains X 0 =. 1). (Wherein X is the remainder of the first column 1 as example for the behavior of the feature value)
Gradient descent with more formal equation:
Q7 normal equations and irreversible:
(1) when not mutually independent irreversible between features;
Irreversible (2) the number of samples less than the number of features.
vocabulary
Multiple linear regression multivariate linear regression feature scaling --- feature scaling non-linear function --- nonlinear function normal equation --- normal equation