[Small note] Is the value of BatchSize set as large as possible?

 The value of BatchSize is not set as big as possible

Usually we may think that when a larger batchsize is set, the training effect of the model will be better. There are several reasons for this:

1. Since the model gets more training data each time, the descending direction of the model will be more accurate, and the model training curve will be smoother.

2. Reduced training time. In the same epoch, the number of batches required by batchsize is reduced, so the processing speed becomes faster.

But ah but,

Larger batchsize has the following problems to pay attention to:

1. Memory problem. Larger batches may cause memory/video memory overflow

2. Decreased generalization ability. This is something I hadn't considered before. Using too large a batch size can negatively affect the accuracy of the network during training because it reduces the randomness of gradient descent.

Using a smaller batch size produces less stable, more random weight updates. This has two positive effects. First, it helps training "jump out" of local minima that it may have gotten stuck in before, and second, it allows training to stabilize at "flatter" minima, which often indicates better generalization performance.

How to choose the Batch size when training the neural network? - Zhihu (zhihu.com)

In the above link (invasion and deletion) pointed out:

  • When there is enough computing power , select the batch size to be 32 or smaller.
  • When the computing power is not enough, trade-off between efficiency and generalization, and try to choose a smaller batch size.
  • When the model training is at the end and you want to improve the results more finely (such as thesis experiments/competitions to the end), there is a useful trick, which is to set the batch size to 1, that is, do pure SGD, and slowly reduce the error.

Guess you like

Origin blog.csdn.net/vibration_xu/article/details/126267108