A New Specification of the Multichain Policy Iteration Algorithm in Undiscounted Markov Renewal Programs
针对无折扣马尔可夫更新规划中的策略迭代算法,提出了一种新的价值向量规范和一个避免解析转移概率矩阵子链的反循环规则,简化了策略评估过程。
We consider the Policy Iteration Algorithm for undiscounted Markov Renewal Programs. Previous specifications of the policy evaluation part of this algorithm all required the analysis of the chain structure for each policy generated. The purpose of this paper is to provide a unique specification of the value vectors as well as an anticycling rule which avoids parsing the transition probability matrices into their subchains.