In fact, Actor-Critic should not be regarded as a combination between DQN and PG
PG obtains the total return G through the MC method, but this method is actually a bit slow and requires complete sampling. And TD is an improvement measure to solve this problem. The essence of DQN is to use the network to implement the TD algorithm under high-dimensional input. Actor-Critic can be seen as using the TD method to improve PG.