This article is reprinted, very classic:
Diving deeper into Reinforcement Learning with Q-Learning
1、Q-learning
Step 1: We init our
Q-table
The initialized Q-table
Step 2: Choose an action
From the starting position, you can choose between going right or down. Because we have a big epsilon rate (since we don’t know anything about the environment yet), we choose randomly. For example… move right.
We move at random (for instance, right)
We found a piece of cheese (+1), and we can now update the Q-value of being at start and going right. We do this by using the Bellman equation.
Steps 4–5: Update the Q-function
- First, we calculate the change in Q value ΔQ(start, right)
- Then we add the initial Q value to the ΔQ(start, right) multiplied by a learning rate.
Think of the learning rate as a way of how quickly a network abandons the former value for the new. If the learning rate is 1, the new estimate will be the new Q-value.
The updated Q-table
Good! We’ve just updated our first Q value. Now we need to do that again and again until the learning is stopped.