The difference between normal equation and gradient descent
- Gradient descent is a continuous iterative method. The computer performs repeated calculations to find the optimal θ vector to make the cost function J(θ) converge. At this time, the optimal hypothetical function parameter θ is obtained.
- The normal equation is a mathematical method. The normal equation is obtained through mathematical derivation. It must be iterated repeatedly, and the optimal θ can be solved in one step.
How to use normal equations
Take a data with n = 3, m = 4, that is, four feature quantities and four training samples as an example.
x corresponds to the feature quantity, because its n = 3, m = 4, each feature quantity x is stored in the matrix, of course, the obtained is a 4*4 ((n+1)*m order matrix, n also needs +1 because it contains x 0 , and x 0 defaults to 1 ) the matrix X. And y is the true value corresponding to the feature quantity in the training sample, and each y is stored as a column vector to obtain a four-dimensional column vector.
After obtaining the matrix X and the column vector y, the normal equation operation can be performed. The calculation formula is:
θ = (X T X) -1 X T y. T is the transpose of the matrix, and -1 is the inversion, which is implemented in Octave using inv() or pinv().
Comparison of gradient descent and normal equation
Disadvantages of the gradient descent method:
- The gradient descent method needs to select the learning rate α.
- Gradient descent requires multiple iterations.
Advantages of gradient descent method: - When n (the number of feature quantities xi) is large, it can also work well.
Disadvantages of normal equations: - When n is large, the running speed is very slow (for most algorithms, the time complexity of solving the inverse of the matrix is O(n3)). Therefore, when n is large, the gradient descent method tends to be used (when n = 10000, you can start to consider using the gradient descent method).
- The normal equation is not suitable for more complex algorithms, and the gradient descent method is still needed at this time.
Python implementation
def normalEqn(X, y):
theta = np.linalg.pinv(X.T@X)@X.T@y
return theta