Machine learning portal (nine) of ---- logistic regression (Newton's method)

Multi gorgeous flowers, and more wonderful season;

No flower can retain its season.

I, too, constantly pursuing,

I eventually want to lose

Back to the logistic regression maximum likelihood function here, and now we use Newton's method to maximize the log-likelihood function.

Newton's law seeking zero

Newton's method is used to find this function 0:00 a method, a function is zero point refers to the value zero that the function argument.
Update formula for the Newton's method,
\ [\ the begin Equation} {\ Theta: = \ theta-\ FRAC {F (\ Theta) {F} {^ \ Prime} (\ Theta)} \} End {Equation \]
This update formula has a very natural explanation is that the tangent at the current approximate zero point zero as a better approximation of the next round of zero. Then keep repeating this process to continue close to the real zero. This process is shown below,

Newton's law seeking extreme points

Newton's method is a method of seeking 0:00, 0:00 and now find a function that is a function of the number of required pilot wit, so extremum points updating formula have the following Newton method,
\ [\ the begin Equation} {\ Theta: = \ theta- \ frac {\ ell ^ {\ prime} (\ theta)} {\ ell ^ {\ prime \ prime} (\ theta)} \ end {equation} \]

Now that we are in the logistic regression parameters to be maximized is a vector. Thus promotion to higher-dimensional Newton's method (also known as Newton-Raphson method), there is,
\ [\ the begin Equation} {\ Theta: = \ {^ H-Theta -. 1} \ nabla _ {\ Theta} \ ELL (\ theta) \ end {equation} \
] where the function $ \ ell (\ theta) $ of the Hessian matrix $ H $ a $ (i, j) $ element is defined,
\ [\ the begin {Equation} H_ {ij of} = \ frac {\ partial ^ {2
} \ ell (\ theta)} {\ \ {i} \ partial \ theta_ {j} partial theta_} \ end {equation} \] under Newton iterative method can be very small number wheel fast convergence, but the cost of descent in each round of the Newton method is generally much higher than the gradient, because he is concerned with finding a matrix inverse order of the number of features (number of features came from, or quickly). Andhra world co-existence law.

Just by Newton's method to find logistic log likelihood function maximum point, a corresponding method called Fisher Score (Fisher scoring.).

img

Guess you like

Origin www.cnblogs.com/qizhien/p/11590339.html