Normal Equation Official:
θ = (XTX)−1XTAND
The derivation process:
Start:
X⋅ θ = Y
Step 1: Multiply Left X T
XTX⋅θ=XTAND
Step 2: Left multiply (X T X) -1
(XTX)−1XTX⋅θ=(XTX)−1XTAND
among them:
(XTX)−1(XTX)=I
and so:
θ = (XTX)−1XTAND
Gradient Descent method:
- Need to choose a learning rate a
- Need multiple iterations
- When the number of features is large, the efficiency is still high
Normal Equation method:
- No need to choose a learning rate a
- No need to iterate
- Not suitable for situations with a large number of features, because matrix multiplication is required