https://www.zhihu.com/question/64134994
1, increasing the batch size will make the gradient is more accurate, but it can also lead to smaller variance, the model may cause local optimum;
2, thus increasing the batch size typically increases learning rate, such as the batch size increases by m times, lr increased m-fold or sqrt (m) times, but is not fixed;
3, increase the learning rate is usually not directly increase is too large, the general will gradually increase through the warm up;
4、warm up策略参考 Bag of Freebies for Training Object Detection Neural Networks
Prepared with the m to warm up batches, initial learning rate is prepared , and then each BATCH , each time the learning rate is set to i * n / m