当前位置：网站首页>Monte Carlo tree search (MCTS) explanation

Monte Carlo tree search (MCTS) explanation

2022-07-22 15:40:00 【Meet the demon king】

Monte carlo tree search （MCTS） Detailed explanation

Monte Carlo tree search is a classical tree search algorithm , Famous town for a time AlphaGo The technical background of is the combination of Monte Carlo tree search and deep strategy value network , Therefore, he defeated the then world champion of go . It is extremely effective for solving the game problem of this large-scale search space , Because its core idea is Put resources on branches that are more worthy of searching , namely Computing power is concentrated in more valuable places .

MCTS The basic process of the algorithm

MCTS The algorithm is mainly divided into four steps , Respectively choice 、 Expand 、 simulation 、 to flash back .

STEP 1： choice （Selection）

Start at the root node , Recursively select the optimal child node , Finally reach a leaf node .

According to what to judge the advantages and disadvantages of nodes ？Upper Confidence Bounds（UCB）
$1\left(S_{i}\right)=\overline{V_{i}}+c \sqrt{\frac{\log N}{n_{i}}}, c=2$
among , $\overline{V_{i}}$ Is the average value of the node ; $c$ Constant , Usually take 2; $N$ Is the total number of explorations ; $n_i$ Is the number of explorations of the current node .

With the top UCB The formula , You can calculate the UCB value , And select UCB The child node with the largest value iterates .

STEP 2： Expand （Expansion）

If the current leaf node is not a termination node , Then create one or more child nodes , Select one of them to expand .

STEP 3： simulation （Simulation）

Start with the expansion node , Run an analog output , Until the game is over . such as , Start from this expansion node , Simulated ten times , Nine victories in the end , Then the score of the extension node will be higher , On the contrary, it is relatively low . Here is also a pseudo code of the simulation process ：

def Rollout(S_i): 
  ## S_i： current state 
	loop forever: 
    ##  Infinite loop 
		if S_i a terimal state: 
      ##  If the current state is the termination state of the game 
      ##  Return pair  S_i  The value of this state , Out of the loop 
			return value(S_i)   
		
		##  If it has not reached the termination state 
    ##  Randomly select an action that can be taken in the current state 
		A_i = random(available_action(S_i)) 
    ##  Through the current state  S_i  With randomly selected actions  A_i  To calculate the state of the next step and assign it to  S_i
		S_i = transform(A_i, S_i)

STEP 4： to flash back （Backpropagation）

Use the results of the third simulation , Echo propagation to update the current action sequence .

for instance

The example in this blog post is already very vivid ！ Write it again here , Deepen the impression .

https://blog.csdn.net/qq_41033011/article/details/109034887

initialization ： Initially, there is a root node $S_0$ , Each node in the tree has two values , The value of nodes $T$ and Number of visits to this node $N$ .

The first 1 Sub iteration ： node $S_0$ The root node is also the leaf node , And not the termination node , So extend it . hypothesis $S_0$ There are two strategies , After the transfer, they are $S_1$ and $S_2$ .

And then , have access to UCB Formula to choose right $S_1$ Expansion is still right $S_2$ Expand . here $N_1$ and $N_2$ Are all 0, So two nodes UCB Values are infinite , So you can choose any node , Choose here $S_1$ To simulate . After simulation , It is found that the final value is 20, So back to update . $T_1 = 20$ , $N_1=1$ , $T_0 = 20$ , $N_0=1$ .

The first 2 Sub iteration ： from $S_0$ Set out to choose , here $S_1$ Of UCB The value is no longer infinite , and $S_2$ Of UCB The value is still infinite , So choose $S_2$ Expand . here we are $S_2$ after , Find out $S_2$ For leaf nodes , And has not been explored , So simulate it . The simulation results are assumed to be 10, Then go back . $T_2=10$ , $N_2 = 1$ , $T_0=30$ , $N_0 = 2$ .

The first 3 Sub iteration ： from $S_0$ set out , Calculation $S_1$ and $S_2$ Of UCB value , Choose a larger one to expand .
$\begin{aligned} &\mathrm{UCB}\left(\mathrm{S_1} \right)=20+2 * \sqrt{\frac{\ln 2}{1}}=21.67 \\ &\mathrm{UCB}\left(\mathrm{S_2}\right)=10+2 * \sqrt{\frac{\ln 2}{1}}=11.67 \end{aligned}$
therefore , choice $S_1$ Expand . here we are $S_1$ after , It is found that it is a leaf node , And has been explored , Then list all possible actions of the current node , And add it to the tree .