部分可观测马尔可夫决策过程综述:理论、模型与算法

State of the Art—A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms

Management Science · 1982
被引 740 · 同刊同年前 1%
人大 A+FT50UTD24ABS 4*

中文导读

综述了部分可观测马尔可夫决策过程(POMDP)的模型与算法,介绍了其与马尔可夫决策过程的关系,并讨论了在质量控制、机器维护、审计、学习和最优停止等领域的应用。

Abstract

This paper surveys models and algorithms dealing with partially observable Markov decision processes. A partially observable Markov decision process (POMDP) is a generalization of a Markov decision process which permits uncertainty regarding the state of a Markov process and allows for state information acquisition. A general framework for finite state and action POMDP's is presented. Next, there is a brief discussion of the development of POMDP's and their relationship with other decision processes. A wide range of models in such areas as quality control, machine maintenance, internal auditing, learning, and optimal stopping are discussed within the POMDP-framework. Lastly, algorithms for computing optimal solutions to POMDP's are presented.

部分可观测马尔可夫决策过程POMDP模型POMDP算法最优解计算