Neural network parameter adjustment experience

Contrast Mean and Max

If some action path (sampling strategy from the output of the neural network) is much better than the average motion path, by adjusting policies have increased the reward space. In contrast, when the gap is narrowing, the model convergence;

Guess you like

Origin www.cnblogs.com/twodoge/p/12080024.html