可数马尔可夫决策过程中Büchi目标不存在好的马尔可夫策略

No good Markov strategies for Büchi objectives in countable MDPs

Annals of Operations Research · 2026

被引 0 · 同刊同年前 10%

ABS 3

Stefan Kiefer
Richard Mayr
Mahsa Shirmohammadi
Patrick Totzke 通讯

中文导读

研究了可数无穷马尔可夫决策过程中Büchi目标（要求无限次访问给定状态子集）的ε最优马尔可夫策略存在性问题，通过构造反例否定了T.P. Hill（1979）提出的猜想。

Abstract

Abstract We study countably infinite Markov decision processes with Büchi objectives, which ask to visit a given subset of states infinitely often. A question left open by T.P. Hill (1979) is whether there always exist $$\varepsilon $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mi>ε</mml:mi> </mml:math> -optimal Markov strategies, i.e., strategies that base decisions only on the current state and on the clock (the number of steps taken so far). We provide a negative answer to this question by constructing a non-trivial counterexample.

马尔可夫决策过程Büchi目标可数状态空间随机过程

阅读原文 ↗