Introduction to machine learning ~ review of formal equations

The difference between normal equation and gradient descent

  • Gradient descent is a continuous iterative method. The computer performs repeated calculations to find the optimal θ vector to make the cost function J(θ) converge. At this time, the optimal hypothetical function parameter θ is obtained.
  • The normal equation is a mathematical method. The normal equation is obtained through mathematical derivation. It must be iterated repeatedly, and the optimal θ can be solved in one step.

How to use normal equations

Take a data with n = 3, m = 4, that is, four feature quantities and four training samples as an example.
Insert picture description here
x corresponds to the feature quantity, because its n = 3, m = 4, each feature quantity x is stored in the matrix, of course, the obtained is a 4*4 ((n+1)*m order matrix, n also needs +1 because it contains x 0 , and x 0 defaults to 1 ) the matrix X. And y is the true value corresponding to the feature quantity in the training sample, and each y is stored as a column vector to obtain a four-dimensional column vector.
After obtaining the matrix X and the column vector y, the normal equation operation can be performed. The calculation formula is:
θ = (X T X) -1 X T y. T is the transpose of the matrix, and -1 is the inversion, which is implemented in Octave using inv() or pinv().
Insert picture description here

Comparison of gradient descent and normal equation

Disadvantages of the gradient descent method:

  • The gradient descent method needs to select the learning rate α.
  • Gradient descent requires multiple iterations.
    Advantages of gradient descent method:
  • When n (the number of feature quantities xi) is large, it can also work well.
    Disadvantages of normal equations:
  • When n is large, the running speed is very slow (for most algorithms, the time complexity of solving the inverse of the matrix is ​​O(n3)). Therefore, when n is large, the gradient descent method tends to be used (when n = 10000, you can start to consider using the gradient descent method).
  • The normal equation is not suitable for more complex algorithms, and the gradient descent method is still needed at this time.

Python implementation

 def normalEqn(X, y):
   theta = np.linalg.pinv(X.T@X)@X.T@y 
   return theta

Guess you like

Origin blog.csdn.net/fatfairyyy/article/details/113821848
Recommended