normal equation (normal equations)
- The normal equation is obtained by solving the following equation to find the parameters that minimizes a cost function:
\[ \frac{\partial}{\partial\theta_j}J\left(\theta\right)=0 \]
- We assume that the training set feature matrix \ (X-\) (contains \ (x_0 =. 1 \) ) and our training set results as a vector \ (Y \) , then using the normal equations solved for vector:
\ [\ Theta = {{\ left ({X ^ T} X \ right)} ^ {- 1}} {X ^ T} y \] - Gradient descent with more formal equation:
- Gradient descent: the need to select learning rate \ (\ Alpha \) ; require multiple iterations; n is large when the number of features can be preferably applied for all types of models;
- Normal equation: Learning rate need not select \ (\ Alpha \) ; no iteration, one operation can be drawn \ (\ Theta \) optimal solution; need to compute \ ({\ left ({X ^ T} X \ right)} ^ {-}. 1 \) ; wherein the number n is large, if the cost calculation is large because an inverse matrix computation time complexity is \ (O (n ^. 3) \) , when n is typically less than 10000 when is acceptable, only applies to linear models, logistic regression model is not suitable for other models.
Programming
In programming operations 1.1: Univariate linear regression based on the realization:
# 正规方程
def normalEqn(X, y):
theta = np.linalg.inv(X.T@X)@X.T@y #X.T@X等价于X.T.dot(X);np.linalg.inv():矩阵求逆
return theta
final_theta2=normalEqn(X, y)#感觉和批量梯度下降的theta的值有点差距
final_theta2
After completion before running the gradient descent algorithm, we output \ (\ Theta \) values as follows:
As can be seen in two ways determined \ (\ Theta \) values are substantially similar.