CS231n课程笔记:Leture7 Training Neural Networks II

目录

Fancier optimization

Regularization

Transfer Learning


Fancier optimization

# Vanilla Gradient Descent

while True:
    weights_grad = evaluate_gradient(loss_fun, data, weights)
    weights += - step_size * weights_grad

 for this type of objective function 

 what does a saddle point mean?

that means the at my current point some directions the loss goes up

Nesterov momentum

 

 

 AdaGrad

RMSProp 

 

At the very first time step, you can see that at the beginning, we've initialized our second moment with zero.Now after one update of the second moment, typically this beta two, second moment decay rate, is something like 0.9 or 0.99 something very close to one.

After one update, our second moment is still very very close to zero.Now when we're making our update step here and we divide by our second moment, now we're deviding by a very small number

Adam adds this bias correction term to aviod this problem of taking very large steps

 

If you can afford to do full batch updates then try out L-BFGS

Regularization

 dropout !!

 More common: Inverted dropout

 data augmentation!

(颜色抖动)

 

 

Transfer Learning

猜你喜欢

转载自blog.csdn.net/m0_53292725/article/details/127022889