Contrast Mean and Max
If some action path (sampling strategy from the output of the neural network) is much better than the average motion path, by adjusting policies have increased the reward space. In contrast, when the gap is narrowing, the model convergence;