The record about the experience of others parameter adjustment

I read your epsilon high initial setup, you should change the function of loss related to this. epsilon initial high is useless, because even learn something he would not have to perform, resulting in what can not be learned by experience to generate more useful. And the middle of loss surge, I think it may be because the agent learned strategy, but because of epsilon too, if a random action twice before, once the best, this will lead to a big loss.

Guess you like

Origin www.cnblogs.com/awgn/p/12339929.html