Stochastic gradient descent method (Stochastic gradient descent, SGD)

BGD (Batch gradient descent) batch gradient descent method: each iteration using all samples (small sample size) Mold has been updated

 

SGD (Stochastic gradientdescent) stochastic gradient descent method: Each iteration uses a set of samples (sample amount) Mold had finished the batch of data is updated once

 

BGD algorithm for training too slow shortcomings, SGD proposed algorithm, the algorithm is common BGD each iteration all the samples have been again, every training gradient put a set of samples updated. The SGD algorithm is randomly selected from a sample group, according to the gradient update after training once, and then extracts a set, then updated in sample size and its large, could not finish all the training you can get a sample loss values ​​within the acceptable range of the model of the.

Guess you like

Origin www.cnblogs.com/focusonoutput/p/12154024.html