Learn note03--error and Gradient Decent

Learn note03–error and Gradient Descent

1. Bias and Variance

If large Bias, underfitting, need desin more complex model, add feature;
If large variance, overfitting, need more data, regularization.
Two ways:
Cross validation
N-fold cross validation

2. Gradient Descent

2.1 Tuning your learning rates

  • Adaptive Learning Rates
    Popular & Simple Idea: Reduce the learning rate by some factor every few epochs
  • Adagrad
    Divide the learning rate of each parameter by the root mean square of its previous derivatives. Like:
    Vanilla Gradient descent: larger gradient, larger step.
    Adagrad: root mean square of the previous derivatives of parameter w.

2.2 Stochastic Gradient Descent——Make the Training Faster

Loss for only one example, update for each example.

2.3 Feature Scaling

Make different features have the same scaling.

2.4 Gradient Descent Theory

Each time we update the parameters, we obtain O that makes L(O) smaller.

2.5 Warning of Math

Formal derivation
Taylor Series
More Limitation of Gradient Descent: stuck at local minima , stuck at saddle point, very slow at the plateau.

猜你喜欢

转载自blog.csdn.net/minovophy/article/details/118833388
今日推荐