泊松老虎机下的策略性实验

Strategic experimentation with Poisson bandits

Theoretical Economics · 2010

被引 121 · 同刊同年前 6%

人大 AABS 4

Godfrey Keller · 牛津大学
Sven Rady · 慕尼黑大学

中文导读

研究了双臂老虎机策略性实验博弈，其中风险臂按泊松过程支付一次性收益，强度未知。发现所有马尔可夫完美均衡都存在相对于单代理最优的“鼓励效应”，且不对称均衡帕累托优于对称均衡。

Abstract

We study a game of strategic experimentation with two-armed bandits where the risky arm distributes lump-sum payoffs according to a Poisson process. Its intensity is either high or low, and unknown to the players. We consider Markov perfect equilibria with beliefs as the state variable and show that all such equilibria exhibit an 'encouragement effect' relative to the single-agent optimum. There is no equilibrium in which all players use cut-off strategies. Owing to the encouragement effect, asymmetric equilibria in which players take turns playing the risky arm before all experimentation stops Pareto dominate the unique symmetric equilibrium. Rewarding the last experimenter with a higher continuation value increases the range of beliefs where players experiment, but may reduce the intensity of experimentation at more optimistic beliefs. This suggests that there is no equilibrium that uniformly maximizes the players' average payoff.

泊松过程战略实验鼓励效应马尔可夫完美均衡

阅读原文 ↗