可约马尔可夫决策过程与随机博弈

Reducible Markov Decision Processes and Stochastic Games

Production and Operations Management · 2021

被引 4

人大 AFT50UTD24ABS 4

Jie Ning · 凯斯西储大学通讯

中文导读

提出可约马尔可夫决策过程，通过求解更简单的坐标MDP来获得精确解，实现维度约简，并扩展到随机博弈，给出纯策略马尔可夫完美均衡的存在条件与闭式解，适用于产能、库存管理和双寡头竞争等场景。

Abstract

Markov decision processes (MDPs) provide a powerful framework for analyzing dynamic decision making. However, their applications are significantly hindered by the difficulty of obtaining solutions. In this study, we introduce reducible MDPs whose exact solutions can be obtained by solving simpler MDPs, termed the coordinate MDPs. The value function and an optimal policy of a reducible MDP are linear functions of those of the associated coordinate MDP. Because the coordinate MDP does not involve the multi‐dimensional endogenous state, we achieve dimension reduction on a reducible MDP. Extending the MDP framework to multiple players, we introduce reducible stochastic games. We show that these games reduce to simpler coordinate games that do not involve the multi‐dimensional endogenous state. We specify sufficient conditions for the existence of a pure‐strategy Markov perfect equilibrium in reducible stochastic games and derive closed‐form expressions for the players’ equilibrium values. The reducible framework encompasses a variety of linear and nonlinear models and offers substantial simplification in analysis and computation. We provide guidelines and illustrative examples on formulating problems as reducible models. We demonstrate the applicability and modeling flexibility of reducible models in a wide range of contexts including capacity and inventory management and duopoly competition.

马尔可夫决策过程随机博弈动态决策维度约简

阅读原文 ↗