Classic Q-learning explanation

This article is reprinted, very classic:

Diving deeper into Reinforcement Learning with Q-Learning

1、Q-learning

Step 1: We init our

Q-table

The initialized Q-table

Step 2: Choose an action
From the starting position, you can choose between going right or down. Because we have a big epsilon rate (since we don’t know anything about the environment yet), we choose randomly. For example… move right.

We move at random (for instance, right)

We found a piece of cheese (+1), and we can now update the Q-value of being at start and going right. We do this by using the Bellman equation.

Steps 4–5: Update the Q-function

  • First, we calculate the change in Q value ΔQ(start, right)
  • Then we add the initial Q value to the ΔQ(start, right) multiplied by a learning rate.

Think of the learning rate as a way of how quickly a network abandons the former value for the new. If the learning rate is 1, the new estimate will be the new Q-value.

The updated Q-table

Good! We’ve just updated our first Q value. Now we need to do that again and again until the learning is stopped.

2. Explanation of Bellman equation

Bellman Equation of Markov Decision Process - Zhihu

Guess you like

Origin blog.csdn.net/qq_18256855/article/details/127354721