零和随机博弈中的最优反应动态

Best-response dynamics in zero-sum stochastic games

Journal of Economic Theory · 2020

被引 35 · 同刊同年前 9%

人大 AABS 4

David S. Leslie · 兰卡斯特大学
Steven W. Perkins · 普华永道（英国）
Zibo Xu · 新加坡科技设计大学通讯

中文导读

研究了两人零和折扣支付随机博弈中的三种学习动态，证明连续时间混合策略最优反应动态收敛到纳什均衡平稳策略集，并扩展了虚拟博弈过程及修正的δ收敛最优反应动态。

Abstract

We define and analyse three learning dynamics for two-player zero-sum discounted-payoff stochastic games. A continuous-time best-response dynamic in mixed strategies is proved to converge to the set of Nash equilibrium stationary strategies. Extending this, we introduce a fictitious-play-like process in a continuous-time embedding of a stochastic zero-sum game, which is again shown to converge to the set of Nash equilibrium strategies. Finally, we present a modified δ-converging best-response dynamic, in which the discount rate converges to 1, and the learned value converges to the asymptotic value of the zero-sum stochastic game. The critical feature of all the dynamic processes is a separation of adaption rates: beliefs about the value of states adapt more slowly than the strategies adapt, and in the case of the δ-converging dynamic the discount rate adapts more slowly than everything else.

零和随机博弈最优反应动态纳什均衡学习动力学

阅读原文 ↗