The understanding of the batchsize size to the network model

foreword

Each epoch passes its data set through the neural network for forward propagation and back propagation. Since the data set for epoch training may be too large, it needs to be divided into blocks, corresponding to the number of sizes of the batch size.

When training data, the size of batchsize will affect the accuracy and the length of training, etc.

The selection of the batch size is between 1 and the total amount of data. If the selection is too large or too small, there will be extremes. If it is too small, underfitting may occur, if it is too large, overfitting may occur. The specific batch size to choose depends on the network and data set

1. Batch size definition

The size of batchsize determines the direction and size of gradient descent

Update the weight through each epoch training, when the batchsize is set to 1, only 1 sample is used to update the parameters, and when the batchsize is set to 2, only 2 samples are used to update the parameters. The update of the weights affects the direction and magnitude of the gradient descent.

2. batch size

The size of the batch size advantage shortcoming
too small (assumed to be 1) Only applicable to small sample data -Time- consuming : When the total amount of data is large, only 1 data is used for each training, and the time will be very long.

- Difficult to converge : individual differences will cause gradients to drop hard, model difficult to converge, and easy to underfit
Too large (assumed to be the total amount of data) - Time saving : multiple data training reduces the batch required -Memory overflow : Such a large amount of data each time may cause memory overflow.

-Others : It is difficult to modify the parameters, the direction of gradient descent has been determined, and there is no slight change

Overfitting:

insert image description here

3. batch size balance

When training data in a neural network, a suitable batch size is generally selected. A large batch size can improve stability, and it can be more stable when the gradient drops, but it is not extremely large. It must be suitable for its own network model and data volume.

It is more accurate to choose the appropriate batchsize descent direction, the fluctuation caused by training is small, and the overall descent direction can be fine-tuned

Advantages of choosing a suitable batchsize:

  • Improve memory utilization, gpu runs maximized
  • Gradient descent direction and size are more accurate

Guess you like

Origin blog.csdn.net/weixin_47872288/article/details/128517121