The LP/POMDP marriage: Optimization with imperfect information
提出一种新方法,用主线性规划分配控制策略,用部分可观测马尔可夫决策过程从对偶价格中找出改进策略,解决部分可观测状态下的资源分配问题,并以飞机分阶段攻击目标为例验证。
A new technique for solving large-scale allocation problems with partially observable states and constrained action and observation resources is introduced. The technique uses a master linear program (LP) to determine allocations among a set of control policies, and uses partially observable Markov decision processes (POMDPs) to determine improving policies using dual prices from the master LP. An application is made to a military problem where aircraft attack targets in a sequence of stages, with information acquired in one stage being used to plan attacks in the next. © 2000 John Wiley & Sons, Inc., Naval Research Logistics 47: 607–619, 2000