How to understand the relationship between Actor-Critic and Policy Gradient

In fact, Actor-Critic should not be regarded as a combination between DQN and PG

PG obtains the total return G through the MC method, but this method is actually a bit slow and requires complete sampling. And TD is an improvement measure to solve this problem. The essence of DQN is to use the network to implement the TD algorithm under high-dimensional input. Actor-Critic can be seen as using the TD method to improve PG.

Guess you like

Origin blog.csdn.net/weixin_43450646/article/details/113586500