Precautions for using pytorch optimizer

  1. The .step() function will only update the parameters specified by the optimizer;
  2. The zero_grad() function will only clear the gradient of the parameters specified by the current optimizer;
  3. The .step() function does not clear the gradient;
  4. The .forward() function will create a dynamic graph, but once backward(), the graph will be cleared, so .backward() cannot be used twice in a row for the same batch of data; but if you want to get the gradient of backpropagation of multiple batches of data, And use these gradients together to do gradient updates, you need to input multiple batches of data separately, and then perform backward() respectively; finally, after backward() is completed, then execute the .step() function together, so that the gradients calculated many times before can be The results are updated uniformly;
  5. If the gradient is not cleared, the gradient of the parameters will continue to accumulate as the data is forwarded. The gradient is obtained by inputting multiple batches, and the gradient is cumulative:
    ∂ (data 1 + data 2) ∂ w = ∂ data 1 ∂ w + ∂ data 2 ∂ w \frac{\partial (data^{1}+data^{2})}{\partial w} = \frac{\partial data^{1}}{\partial w} + \frac{\partial data^{2}}{\partial w}w(data1+data2)=wdata1+wdata2

Guess you like

Origin blog.csdn.net/weixin_42988382/article/details/123162180