台大李宏毅

1:Regression-Case Study

为什么在Loss function中,只考虑对w的正则化,而不考虑对b的正则化?

因为b是一条水平线,b对Loss function是否平滑几乎不产生影响。

1-Regression Demo

Ada-Gradient时会详细讲解这个技巧:小的learning rate导致要很多次迭代才能达到最优解,大的learning rate有可能会有巨幅震荡,也无法达到最优解。有一个调参的技巧,就是对w和b克制化的learning rate。

lr = 1

....................................

lr_b = 0

lr_w = 0

....................................

lr_b = lr_b + b_grad ** 2

lr_w = lr_w + w_grad ** 2

.................................

# update parameters.

b = b - lr/np.sqrt(lr_b)* b_grad

w = w- lr/np.sqrt(lr_w)* w_grad

2:Where does the error come from?
 

error due to “bias” and error due to “variance”。

简单的model(model set比较小,这个小的model set可能根本不包含真实的target model),bias大,variance小;

复杂的model(model set比较大,这个大的model set可能就包含真实的target model),bias小,variance大。

如果error来自于variance很大,那么就是overfitting;

如果error来自于bias很大,那么就是underfitting;


What to do with large bias?

1、Diagnosis:

(1) If your model cannot even fit the training examples, then you have large bias.----> Underfitting.

(2) If you can fit the training data, but large error on testing data, then you probably have large variance. ----> Overfitting.

2、For bias, redesign your model:

(1) Add more features as input;

(2) A more complex model

What to do with large variance?

1、 More data(very effective, but not always practical)可以自己做训练数据,例如翻转、加噪声等。

2、 Regularization (希望参数变化较小,曲线变平滑),但是可能会使你的model set 不包含target model,可能会伤害bias。

猜你喜欢

转载自blog.csdn.net/weixin_41078740/article/details/84522502
今日推荐