弱耦合马尔可夫决策过程的自适应网络松弛

Self-Adapting Network Relaxations for Weakly Coupled Markov Decision Processes

Management Science · 2024

被引 1

人大 A+FT50UTD24ABS 4*

Selvaprabu Nadarajah · 伊利诺伊大学厄巴纳-香槟分校
André A. Ciré · 多伦多大学

中文导读

提出一种新的线性规划松弛方法（可行性网络松弛），能自动调整大小以适应弱耦合马尔可夫决策过程的链接约束结构，提供比拉格朗日松弛更紧的界和更快的求解速度。

Abstract

High-dimensional weakly coupled Markov decision processes (WDPs) arise in dynamic decision making and reinforcement learning, decomposing into smaller Markov decision processes (MDPs) when linking constraints are relaxed. The Lagrangian relaxation of WDPs (LAG) exploits this property to compute policies and (optimistic) bounds efficiently; however, dualizing linking constraints averages away combinatorial information. We introduce feasibility network relaxations (FNRs), a new class of linear programming relaxations that exactly represents the linking constraints. We develop a procedure to obtain the unique minimally sized relaxation, which we refer to as self-adapting FNR, as its size automatically adjusts to the structure of the linking constraints. Our analysis informs model selection: (i) the self-adapting FNR provides (weakly) stronger bounds than LAG, is polynomially sized when linking constraints admit a tractable network representation, and can even be smaller than LAG, and (ii) self-adapting FNR provides bounds and policies that match the approximate linear programming (ALP) approach but is substantially smaller in size than the ALP formulation and a recent alternative Lagrangian that is equivalent to ALP. We perform numerical experiments on constrained dynamic assortment and preemptive maintenance applications. Our results show that self-adapting FNR significantly improves upon LAG in terms of policy performance and/or bounds, while being an order of magnitude faster than an alternative Lagrangian and ALP, which are unsolvable in several instances. This paper was accepted by Baris Ata, stochastic models and simulation. Supplemental Material: The electronic companion and data files are available at https://doi.org/10.1287/mnsc.2022.01108 .

弱耦合马尔可夫决策过程可行网络松弛自适应网络松弛拉格朗日松弛

阅读原文 ↗