关于鲁棒马尔可夫决策过程的凸优化形式

On the Convex Formulations of Robust Markov Decision Processes

Mathematics of Operations Research · 2024

被引 4

ABS 3

Julien Grand-Clément
Marek Petrik

中文导读

本文首次提出鲁棒马尔可夫决策过程在经典矩形假设下的凸优化形式，通过熵正则化和变量变换导出多项式规模的凸规划，并针对多面体、椭球和熵不确定性集简化为锥规划，为求解RMDPs提供了新方向。

Abstract

Robust Markov decision processes (MDPs) are used for applications of dynamic optimization in uncertain environments and have been studied extensively. Many of the main properties and algorithms of MDPs, such as value iteration and policy iteration, extend directly to RMDPs. Surprisingly, there is no known analog of the MDP convex optimization formulation for solving RMDPs. This work describes the first convex optimization formulation of RMDPs under the classical sa-rectangularity and s-rectangularity assumptions. By using entropic regularization and exponential change of variables, we derive a convex formulation with a number of variables and constraints polynomial in the number of states and actions, but with large coefficients in the constraints. We further simplify the formulation for RMDPs with polyhedral, ellipsoidal, or entropy-based uncertainty sets, showing that, in these cases, RMDPs can be reformulated as conic programs based on exponential cones, quadratic cones, and nonnegative orthants. Our work opens a new research direction for RMDPs and can serve as a first step toward obtaining a tractable convex formulation of RMDPs. Funding: The work in the paper was supported, in part, by NSF [Grants 2144601 and 1815275]; and Agence Nationale de la Recherche [Grant 11-LABX-0047].

鲁棒优化马尔可夫决策过程凸优化动态规划

阅读原文 ↗