当前位置:网站首页>Monte Carlo tree search (MCTS) explanation
Monte Carlo tree search (MCTS) explanation
2022-07-22 15:40:00 【Meet the demon king】
Monte carlo tree search (MCTS) Detailed explanation
Monte Carlo tree search is a classical tree search algorithm , Famous town for a time AlphaGo The technical background of is the combination of Monte Carlo tree search and deep strategy value network , Therefore, he defeated the then world champion of go . It is extremely effective for solving the game problem of this large-scale search space , Because its core idea is Put resources on branches that are more worthy of searching , namely Computing power is concentrated in more valuable places .
MCTS The basic process of the algorithm
MCTS The algorithm is mainly divided into four steps , Respectively choice 、 Expand 、 simulation 、 to flash back .
STEP 1: choice (Selection)
Start at the root node , Recursively select the optimal child node , Finally reach a leaf node .
According to what to judge the advantages and disadvantages of nodes ?Upper Confidence Bounds(UCB)
U C B 1 ( S i ) = V i ‾ + c log N n i , c = 2 U C B 1\left(S_{i}\right)=\overline{V_{i}}+c \sqrt{\frac{\log N}{n_{i}}}, c=2 UCB1(Si)=Vi+cnilogN,c=2
among , V i ‾ \overline{V_{i}} Vi Is the average value of the node ; c c c Constant , Usually take 2; N N N Is the total number of explorations ; n i n_i ni Is the number of explorations of the current node .
With the top UCB The formula , You can calculate the UCB value , And select UCB The child node with the largest value iterates .
STEP 2: Expand (Expansion)
If the current leaf node is not a termination node , Then create one or more child nodes , Select one of them to expand .
STEP 3: simulation (Simulation)
Start with the expansion node , Run an analog output , Until the game is over . such as , Start from this expansion node , Simulated ten times , Nine victories in the end , Then the score of the extension node will be higher , On the contrary, it is relatively low . Here is also a pseudo code of the simulation process :
def Rollout(S_i):
## S_i: current state
loop forever:
## Infinite loop
if S_i a terimal state:
## If the current state is the termination state of the game
## Return pair S_i The value of this state , Out of the loop
return value(S_i)
## If it has not reached the termination state
## Randomly select an action that can be taken in the current state
A_i = random(available_action(S_i))
## Through the current state S_i With randomly selected actions A_i To calculate the state of the next step and assign it to S_i
S_i = transform(A_i, S_i)
STEP 4: to flash back (Backpropagation)
Use the results of the third simulation , Echo propagation to update the current action sequence .
for instance
The example in this blog post is already very vivid ! Write it again here , Deepen the impression .
https://blog.csdn.net/qq_41033011/article/details/109034887
initialization : Initially, there is a root node S 0 S_0 S0, Each node in the tree has two values , The value of nodes T T T and Number of visits to this node N N N.
The first 1 Sub iteration : node S 0 S_0 S0 The root node is also the leaf node , And not the termination node , So extend it . hypothesis S 0 S_0 S0 There are two strategies , After the transfer, they are S 1 S_1 S1 and S 2 S_2 S2.
And then , have access to UCB Formula to choose right S 1 S_1 S1 Expansion is still right S 2 S_2 S2 Expand . here N 1 N_1 N1 and N 2 N_2 N2 Are all 0, So two nodes UCB Values are infinite , So you can choose any node , Choose here S 1 S_1 S1 To simulate . After simulation , It is found that the final value is 20, So back to update . T 1 = 20 T_1 = 20 T1=20, N 1 = 1 N_1=1 N1=1, T 0 = 20 T_0 = 20 T0=20, N 0 = 1 N_0=1 N0=1.
The first 2 Sub iteration : from S 0 S_0 S0 Set out to choose , here S 1 S_1 S1 Of UCB The value is no longer infinite , and S 2 S_2 S2 Of UCB The value is still infinite , So choose S 2 S_2 S2 Expand . here we are S 2 S_2 S2 after , Find out S 2 S_2 S2 For leaf nodes , And has not been explored , So simulate it . The simulation results are assumed to be 10, Then go back . T 2 = 10 T_2=10 T2=10, N 2 = 1 N_2 = 1 N2=1, T 0 = 30 T_0=30 T0=30, N 0 = 2 N_0 = 2 N0=2.
The first 3 Sub iteration : from S 0 S_0 S0 set out , Calculation S 1 S_1 S1 and S 2 S_2 S2 Of UCB value , Choose a larger one to expand .
U C B ( S 1 ) = 20 + 2 ∗ ln 2 1 = 21.67 U C B ( S 2 ) = 10 + 2 ∗ ln 2 1 = 11.67 \begin{aligned} &\mathrm{UCB}\left(\mathrm{S_1} \right)=20+2 * \sqrt{\frac{\ln 2}{1}}=21.67 \\ &\mathrm{UCB}\left(\mathrm{S_2}\right)=10+2 * \sqrt{\frac{\ln 2}{1}}=11.67 \end{aligned} UCB(S1)=20+2∗1ln2=21.67UCB(S2)=10+2∗1ln2=11.67
therefore , choice S 1 S_1 S1 Expand . here we are S 1 S_1 S1 after , It is found that it is a leaf node , And has been explored , Then list all possible actions of the current node , And add it to the tree .
Then you can do as before , Random selection S 3 S_3 S3 perhaps S 4 S_4 S4 Expand , And so on .
Reference
Monte carlo tree search MCTS introduction :https://blog.csdn.net/qq_41033011/article/details/109034887
边栏推荐
- V-IF, V-for, list filtering and sorting, forced binding of class and style, and collection of form information
- Monotonic stack framework
- 对原数组有影响的几个方法
- 旋转矩阵_百度百科
- 产业数字化全面加速,智能云网2.0重新定义数字化转型
- Exponential moving average method_ Baidu Encyclopedia
- Winning tickets on the "code" -- the second bullet of ticket purchase welfare at tdengine Developer Conference
- df. drop_ Duplicates() explanation + usage
- Pyside2 as a simple browser
- 零基础学习CANoe Panel(2)—— 控件布局
猜你喜欢
Chapter 3 - creating and maintaining MySQL data tables
Bannertext (watermark text)
JS高级 之 ES6 实现继承
Zero basic learning canoe panel (2) -- control layout
LeetCode 0814. 二叉树剪枝
上传图片到本机iis服务器,结果以网页地址形式返回,可直接访问
产业数字化全面加速,智能云网2.0重新定义数字化转型
蒙特卡洛树搜索(MCTS)详解
欢乐的彝族火把节Joyous Torch Festival of the Yi Nationality
AutoComplete(自动完成)
随机推荐
Cookies and sessions
【06】指令跳转:原来if...else就是goto
玩转CANoe,博客目录大全
nacos基础概念和单机启动
OUU益生菌精耕胃肠健康,获奖天猫国际微生态创新大会
Can Oracle RAC be built on Youxuan database?
Niuke 2022 summer training session I chiitoitsu solution
Rebound shell carries out suid authorization through ordinary users
如何用一手数据可视化获得老板的青睐,抓住数据可视化关键点
How to select current probe
Reasons for driving voltage deviation caused by high voltage differential probe
重载(overload)和重写(override)的区别
JS高级 之 ES6 实现继承
The first session of Niuke 2022 summer training c question grab the seat! Solution
df. drop_ Duplicates() explanation + usage
How Linux queries Oracle error logs
[C language - file] the data can finally be out of the memory. Go to the outside world to have a look / (o)/~~
调试VBS visual studio
商业智能BI分析思维:生产制造行业的现金周期(二)
卷妹的面试小抄每日更新Day1