受限观测的马尔可夫决策过程：有限期界情形

Markov decision processes with restricted observations: Finite horizon case

Naval Research Logistics · 1997

被引 0

ABS 3

Yasemin Serin
Zeynep Müge Avṣar 通讯

中文导读

研究了状态不可观测但状态分组可观测的马尔可夫决策问题，证明了有限期界下存在确定性最优策略，并给出了计算最小化总期望折现成本的算法。

Abstract

In this article we consider a Markov decision process subject to the constraints that result from some observability restrictions. We assume that the state of the Markov process under consideration is unobservable. The states are grouped so that the group that a state belongs to is observable. So, we want to find an optimal decision rule depending on the observable groups instead of the states. This means that the same decision applies to all the states in the same group. We prove that a deterministic optimal policy exists for the finite horizon. An algorithm is developed to compute policies minimizing the total expected discounted cost over a finite horizon. © 1997 John Wiley & Sons, Inc. Naval Research Logistics 44: 439–456, 1997

马尔可夫决策过程部分可观测马尔可夫决策过程运筹学应用数学

阅读原文 ↗