batch gradient descent method, mini-batch, SGD

First of all, if the training set is small, directly use the batch gradient descent method. If the sample set is small, there is no need to use the mini-batch gradient descent method. You can quickly process the entire training set, so it is also good to use the batch gradient descent method. At least it is less than 2000 samples, so it is more suitable to use the batch gradient descent method. Otherwise, if the number of samples is large, the general mini-batch size is 64 to 512. Considering the computer memory settings and usage, if the mini-batch size is a power of 2, the code will run faster, 64 is 2 To the 6th power, and so on, 128 is 2 to the 7th power, 256 is 2 to the 8th power, and 512 is 2 to the 9th power. So I often set the mini-batch size to 2 to the power of n.
SGD is the case where the mini-batch size is 1

Guess you like

Origin blog.csdn.net/qq_38574975/article/details/107577511