batch、epoch、iteration

Depth study of the optimization algorithm, it means the gradient descent. Every parameter update in two ways.

A first traversing the entire data set count as one loss function, and the gradient of each operator function parameter update gradient. This method parameter must be updated every episode of all the data samples are read, and a large amount of computation overhead, slow calculation does not support online learning, this is called Batch gradient descent, batch gradient descent.

Another, even if each data bit to see a loss of function, and then update the parameters required gradient, the gradient descent called random, stochastic gradient descent. This method is faster, but the convergence is not very good, probably dangling near the optimum point, hit less than optimum. Updated twice parameters are also likely to cancel each other out, resulting in the objective function of the shock more intense.

In order to overcome the disadvantages of the two methods are now generally used is a compromise means, mini-batch gradient decent, small batch gradient descent, this method is divided into several batches of data, the parameter is updated by batch, so, a set of data in one batch together determine the direction of this gradient, it is not easy to decrease the deviation, randomness is reduced. Partly because the number of samples of batches of the entire data set much smaller compared to the amount of calculation is not great.

Now with the optimizer SGD is an acronym for stochastic gradient descent, but it does not mean that a sample is updated on a return, or a mini-batch basis.

What batch epoch iteration represent?


(1) batchsize: batch size. Depth study, the general SGD training, i.e. each training in the training set of training samples taken batchsize;

(2) iteration: 1 th iteration equals batchsize samples using a training;

(3) epoch: 1 th epoch equals using all training samples in the training set a popular talk value of the entire data set is the epoch times is round.

For example, a training set of 500 samples, batchsize = 10, then complete a training sample set: iteration = 50, epoch = 1.

batch: deep learning every update parameters required by a data loss function is not obtained, but rather a set of data obtained by weighting, this set of data is the number batchsize.

The maximum total number of samples is batchsize N, and at that point Full batch learning; the minimum is 1, that is, each only a training sample, this is the online learning (Online Learning). As we learn in batches, every time all the training data used to complete a Forword BP operations, and once operational, to be completed once epoch.
----------------
Disclaimer: This article is CSDN blogger "bboysky45 'original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement. .
Original link: https: //blog.csdn.net/qq_18668137/article/details/80883350


 

mnist data set has 60,000 images as training data, 10,000 pictures as the test data. Suppose now choose Batch Size = 100 model is trained. Iteration 30,000 times.

The number of pictures to be trained for each Epoch: 60,000 (all images on the training set)

Batch has a number of training set: 60,000 / 100 = 600
number Batch needs to be done for each Epoch: 600
Number Iteration each Epoch has: 600 (Batch complete a training parameter corresponding to one iteration)
in each of the Epoch number of model weight update occurred: 600
after training 10 Epoch, the model number of weight updates: 600 * 10 = 6000
different Epoch training, in fact, use the same training data set. 1st Epoch and Epoch although 10 are used in the training set of 60,000 pictures, but the right weight models updated value is completely different. Epoch because of the different models at different locations on the cost function space, after training by generation models of the more, the more close to the bottom, the smaller the price.
Completed a total of 30,000 iterations, equivalent to the completion 30,000 / 600 = 50 Epoch
----------------
Disclaimer: This article is the original article CSDN bloggers "xytywh", following the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
Original link: https: //blog.csdn.net/xiaohuihui1994/article/details/80624593

Guess you like

Origin www.cnblogs.com/yibeimingyue/p/11823154.html