多臂老虎机博弈

Multi-armed bandit games

Annals of Operations Research · 2024

被引 0

ABS 3

Kemal Gürsoy 通讯

中文导读

将多臂老虎机问题扩展到博弈场景，用平均场博弈模型处理大量并行决策，建立动态博弈与序列优化的联系。

Abstract

Abstract A sequential optimization model, known as the multi-armed bandit problem, is concerned with optimal allocation of resources between competing activities, in order to generate the most likely benefits, for a given period of time. In this work, following the objective of a multi-armed bandit problem, we consider a mean-field game model to approach to a large number of multi-armed bandit problems, and propose some connections between dynamic games and sequential optimization problems.

博弈论动态优化多臂老虎机问题平均场博弈

阅读原文 ↗