Deep Learning - Batch_Size

1. Concept
Batch_Size is a parameter that can only be used in batch learning. Statistical learning can be divided into online learning and batch learning. That is to say, deep learning can be divided into online learning and batch learning according to a certain division method. We usually use batch learning more often.

Online Learning\color{blue}{Online Learning}Online learning : accept a sample each time, make predictions, and repeat the learning model;
batch learning\color{blue}{batch learning}Batch learning : It is an offline learning method (all samples need to be obtained in advance for training tasks), and all samples or some samples need to be learned.

The purpose of setting batch_size is to allow the model to select batches of data for processing each time during the training process. The objective function in the general machine learning or deep learning training process can be simply understood as: the sum of the objective function values ​​obtained on each training set sample, and then adjust the weight value according to the value of the objective function, most of the time it is based on The gradient descent method is used to update the parameters. \color{blue}{The sum of the objective function values ​​obtained on each training set sample, and then adjust the weight value according to the value of the objective function, and most of the time, the parameter update is performed according to the gradient descent method. }The sum of the objective function values ​​obtained on each training set sample, and then adjust the weight value according to the value of the objective function, most of the time, the parameters are updated according to the gradient descent method.

The intuitive understanding of Batch_Size is the number of samples selected for one training session.
The size of Batch_Size affects the optimization degree and speed of the model, and it directly affects the usage of GPU memory. If your GPU memory is not large, it is better to set this value smaller.

2. Function
If online learning is used, it is equivalent to Batch_Size = 1. In this case, the randomness of the gradient descent direction is relatively large, and the model is difficult to converge stably (slow convergence speed), so we usually use batch learning \ color{blue}{batch learning}Batch learning .
If the data set is relatively small, the form of a full data set can be used. The direction determined by the full data set can better represent the sample population, thus more accurately towards the direction where the extremum lies\color{blue}{The direction determined by the full data set can better represent the sample population, thus more accurately towards the direction of the extremum}The direction determined by the full data set is better representative of the sample population and thus more accurately oriented in the direction of the extremum . But for large data sets, inputting all the data into the network at one timewill cause a memory explosion\color{blue}{will cause a memory explosion}It will cause an explosion of memory , so the concept of Batch Size is proposed. In addition, Batch_Normal also needs batch data to find the mean square error. If Batch_Size is 1, BN will not work.

3. Settings
3.1. Advantages of increasing Batch_Size

  • Improved memory utilization and more efficient parallelization of large matrix multiplications.
  • The number of iterations required to run an epoch (full data set) is reduced, and the processing speed for the same amount of data is further accelerated.
  • Within a certain range, generally speaking, the larger the Batch_Size, the more accurate the determined descending direction will be, and the smaller the training shock will be.

3.2. Disadvantages of increasing Batch_Size

  • The memory utilization rate has increased, but the memory capacity may not be able to support it.
  • The number of iterations required to run an epoch (full data set) is reduced. To achieve the same accuracy, the time it takes is greatly increased, that is, the number of epochs increases, so the correction of the parameters becomes slower.
  • Batch_Size increases to a certain extent, and its determined descending direction basically does not change.

3.3. Reduce the disadvantages of Batch_Size

  • When the Batch_Size is too small and there are many categories, it may really cause the loss function to oscillate and not converge, especially when the network is more complex.

To sum up, Batch_Size increases to a certain point to achieve the optimal time. Since the final convergence accuracy will fall into different local extremums, the Batch_Size is increased to a certain point to achieve the optimal final convergence accuracy.

Reference:
https://blog.csdn.net/jiachang98/article/details/124729988
https://blog.csdn.net/MengYa_Dream/article/details/119719576

Guess you like

Origin blog.csdn.net/weixin_40826634/article/details/128166646