How to set batchsize

batchsize too small: each calculation of gradient instability caused shock training is relatively large, it is difficult to converge.

batchsize too large:

(1) improved memory utilization, a large matrix multiplication parallel efficiency improved.

(2) compare the calculated quasi gradient direction, causing shock training relatively small.

Number of iterations (3) required to finish one epoch becomes smaller, the same amount of data the data processing speed.

Disadvantages: easy to spill the contents, you want to achieve the same accuracy, epoch will become increasingly large, easy to fall into local optima, poor generalization performance.

 

batchsize provided: usually 10 to 100, usually set to n-th power of 2.

Reason: the GPU memory and cpu computer are stored in binary mode, the n-th power of 2 can be computed faster.

Guess you like

Origin www.cnblogs.com/happytaiyang/p/11617551.html