零和随机博弈中的最优反应动态

Best-response dynamics in zero-sum stochastic games

Journal of Economic Theory · 2020
被引 35 · 同刊同年前 9%
人大 AABS 4

中文导读

研究了两人零和折扣支付随机博弈中的三种学习动态,证明连续时间混合策略最优反应动态收敛到纳什均衡平稳策略集,并扩展了虚拟博弈过程及修正的δ收敛最优反应动态。

Abstract

We define and analyse three learning dynamics for two-player zero-sum discounted-payoff stochastic games. A continuous-time best-response dynamic in mixed strategies is proved to converge to the set of Nash equilibrium stationary strategies. Extending this, we introduce a fictitious-play-like process in a continuous-time embedding of a stochastic zero-sum game, which is again shown to converge to the set of Nash equilibrium strategies. Finally, we present a modified δ-converging best-response dynamic, in which the discount rate converges to 1, and the learned value converges to the asymptotic value of the zero-sum stochastic game. The critical feature of all the dynamic processes is a separation of adaption rates: beliefs about the value of states adapt more slowly than the strategies adapt, and in the case of the δ-converging dynamic the discount rate adapts more slowly than everything else.

零和随机博弈最优反应动态纳什均衡学习动力学