无折扣马尔可夫更新规划中多链策略迭代算法的新规范

A New Specification of the Multichain Policy Iteration Algorithm in Undiscounted Markov Renewal Programs

Management Science · 1980
被引 7
人大 A+FT50UTD24ABS 4*

中文导读

针对无折扣马尔可夫更新规划中的策略迭代算法,提出了一种新的价值向量规范和一个避免解析转移概率矩阵子链的反循环规则,简化了策略评估过程。

Abstract

We consider the Policy Iteration Algorithm for undiscounted Markov Renewal Programs. Previous specifications of the policy evaluation part of this algorithm all required the analysis of the chain structure for each policy generated. The purpose of this paper is to provide a unique specification of the value vectors as well as an anticycling rule which avoids parsing the transition probability matrices into their subchains.

马尔可夫更新规划策略迭代算法无折扣链结构