🌙

一种面向神经网络蒙特卡洛树搜索的高效节点选择策略

An Efficient Node Selection Policy for Monte Carlo Tree Search with Neural Networks

INFORMS journal on computing · 2024
被引 7 · 同刊同年前 7%
人大 BUTD24ABS 3

中文导读

针对神经网络蒙特卡洛树搜索中节点选择效率低的问题,提出一种基于多阶段排序与选择的策略,在有限搜索预算下最大化正确选择根节点最优动作的概率,实验表明在棋盘游戏和OpenAI任务中优于AlphaGo Zero和MuZero的UCT策略。

Abstract

Monte Carlo tree search (MCTS) has been gaining increasing popularity, and the success of AlphaGo has prompted a new trend of incorporating a value network and a policy network constructed with neural networks into MCTS, namely, NN-MCTS. In this work, motivated by the shortcomings of the widely used upper confidence bounds applied to trees (UCT) policy, we formulate the node selection problem in NN-MCTS as a multistage ranking and selection (R&S) problem and propose a node selection policy that efficiently allocates a limited search budget to maximize the probability of correctly selecting the best action at the root state. The value and policy networks in NN-MCTS further improve the performance of the proposed node selection policy by providing prior knowledge and guiding the selection of the final action, respectively. Numerical experiments on two board games and an OpenAI task demonstrate that the proposed method outperforms the UCT policy used in AlphaGo Zero and MuZero, implying the potential of constructing node selection policies in NN-MCTS with R&S procedures. History: Accepted by Bruno Tuffin, Area Editor for Simulation. Funding: This work was supported by the National Natural Science Foundation of China [Grants 72325007, 72250065, and 72022001], and a PKU-Boya Postdoctoral Fellowship 2406396158. Supplemental Material: The software that supports the findings of this study is available within the paper and its Supplemental Information ( https://pubsonline.informs.org/doi/suppl/10.1287/ijoc.2023.0307 ) as well as from the IJOC GitHub software repository ( https://github.com/INFORMSJoC/2023.0307 ). The complete IJOC Software and Data Repository is available at https://informsjoc.github.io/ .

蒙特卡洛树搜索神经网络运筹优化人工智能博弈论