Monte Carlo Tree Search

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/j___t/article/details/88755024

The process of Upper Confidence Bounds for Trees (UCT)

在这里插入图片描述

1. Selection

Starting at the root node R, search a node L which is not expanded fully.
Specifically, we use UCB(upper confidence bound) algorithm to choose the descendant node with the most potential.
在这里插入图片描述
v is the parent node of the v’.
Q(v’) represent the value of v’, which is ruled by yourself.
N(v’) represent the visit times of v’. N(v) represent the visit times of the parent node of v’.
Cp is the constant that trade off exploration and exploitation.

From the root R:

  • If all child nodes of R are visited, we choose the child node which has the maximum UCB value.
  • Otherwise, we randomly choose a child node which has not been visited.

2. Expansion

After selection, we choose the node L. If L isn’t a terminal node(it means that the game is not over yet), randomly create a node as its child node.
The node which is created represent the next step of the game.
If L is a terminal node, then return the current node L. Because it is already a exit status.

3. Simulation

Get node c from the above two steps. And starting from node c, simulate the game until the end of the game.
We can randomly play until we can decide the winner.

4.Backpropagation

In the step 3, we get a result from the simulation of the game. And we must use the result to update the node.
From the node c to the root R, we update all the node as follows:
N(v) = N(v) + 1
Q(v) = Q(v) + reward

The reason we use MCTS

Monte carlo tree search is based on sampling to get results, rather than exhaustive enumeration.
Although we can’t enumerate all the game cases, the results is still fairly reliable.

Pseudocode

在这里插入图片描述

Code

to be continue…

猜你喜欢

转载自blog.csdn.net/j___t/article/details/88755024