Policy gradient reinforcement learning and optimize the depth of (a) - PolicyGradient

introduction

  Before talking about all kinds of reinforcement learning algorithms, such as DQN, DRQN, A3C. In these algorithms, the goal is to find the right strategy to be able to get the most reward. Since the Q function which behavior can be best behavior performed in one state, therefore, the use of Q function to find the optimal strategy. In the policy gradient method, we can not apply the policy to get the best strategy.

Strategy gradient

  Policy gradient reinforcement learning (RL) of one amazing algorithm to optimize the parameters of the policy directly through a number of parameters. Prior to this, we have studied the use of Q function to find the optimal strategy. You will now find the optimal strategy case study in how not to use the Q function. First, the policy function is defined as π (a|s) \ pi (a | s)π ( in

Guess you like

Origin blog.csdn.net/weixin_43283397/article/details/105140600