Study notes for reinforcement learning

1 Introduction

Thank you Professor Li Hongyi for the explanation!

2 Sampling sample()-strategies for exploring actions

The sample() function corresponds to the role of "sample augmentation" in the training process;

3 Sarsa and Q-Learning-the original reinforcement learning algorithm

3.1 Reinforcement learning based on Q-Learning-using Q tables for action selection

In fact, the idea of ​​Q-Learning is very simple, just like putting an elephant in a refrigerator.

The basic steps are:

  1. Observe the environment and get the observation;
  2. Query the Q table according to obs and select the action with the largest Q value;
  3. Perform the action.

3.2 Expected goals of Sarsa and Q-Learning

In fact, the goals of these two algorithms are different, leading to different results:

Sarsa: Maximize the average level of reward for sample() behavior;

Q-Learning: Maximize the reward of maxQ() behavior;

3 DQN-replace the Q form with a neural network

3.1 Why use a neural network to replace the Q table?

If the space of the action state is continuous, it may not be possible to express this space using the Q table (the possible values ​​of the continuous state are infinite),

So we regard "state-Q value" as a kind of mapping, that is to say: use the idea of ​​function mapping to describe the mapping relationship of "state-Q value" ;

Since it is a function mapping, then our DNN is on the stage~

4 Actor-Critic algorithm

In my opinion, Actor and Critic have such characteristics:

Actor-instinct

Critic-Experienced

The specific form is Q Function;

We use TD to quantify Q (this is also the method taught by Professor Li),

I feel that Critic has the role of directing the reward rule;

Perceptual knowledge: expresses the model's understanding of the rules, (at the same time the reward function is diversified);

Guess you like

Origin blog.csdn.net/songyuc/article/details/106827069