Neural network training experience

[Static] experience

1. the ResNet 3D-50, M + # param about 30, Kinetics-400, using Dropout 0.2, weight decay using 5e-4, momentum 0.9.

2. the ResNet-23 is 2D, # param about 11M, Kinetics-400, using Dropout 0.5, weight decay using 1e-4, momentum0.9.

 

【Learning Rate】

Compared with LR adjustment step, an annealing manner, such that the training process more smoothly, while being able to converge to a good result, while more stable.

 

【Batch Size】

BN sensitive to batch size, if bn, the larger batch size is conducive to better fit bn sample population distribution.

 

【Weight Decay】

Based on current experience, wd network parameters and the amount of data on the amount of training. When the amount of data about the same, a large macroreticular using weight decay (ResNet-50 3D, # param about 30M, Kinetics-400, WD using 5e-4), a small a small network used wd (ResNet-23 2D, # param about 11M , Kinetics-400, WD using 1e-4).

 

【Dropout】

Based on current experience, do small network should be big, do big network should be small. E.g. ResNet-50 3D, # param about 30M, Kinetics-400, do use 0.2; the ResNet-23 is 2D, # param about 11M, Kinetics-400, do 0.5 .

 

Guess you like

Origin www.cnblogs.com/hizhaolei/p/10113026.html