The derivation process of the policy gradient theorem of reinforcement learning is derived based on Markov decision process (MDP) and probability theory. Its derivation steps are: 1) Define state space S, behavior space A, discount factor γ and reward function R; 2) Construct state-value function V(s); 3) Solve the expected value of state-value function V(s); 4 ) Construct the policy function π(s); 5) Solve the expected value of the policy function π(s); 6) Deduce the policy gradient theorem.
May I ask the derivation process of the policy gradient theorem of reinforcement learning is the above
Guess you like
Origin blog.csdn.net/weixin_35755562/article/details/129533644
Ranking