动态规划中的参数迭代法

The Parameter Iteration Method in Dynamic Programming

Management Science · 1989

被引 12

人大 A+FT50UTD24ABS 4*

Gal Shmuel · IBM海法研究院通讯

中文导读

针对状态变量维度极高、传统方法无法求解的动态规划问题，提出一种结合模拟和递归估计的参数迭代法，以近似计算价值函数，并用多项目马尔可夫系统的最优替换策略问题展示其应用。

Abstract

Many practical problems involve making optimal decisions for systems with state characterized by many components. These problems lead to dynamic programming problems with a very large number of state variables. Thus, an exact derivation of the optimal policy for such problems is not feasible to solve numerically due to the great amount of computer time and storage involved. This paper presents a practical method, denoted as the Parameter Iteration Method, for obtaining an approximate solution for the above described problem. The computational difficulty caused by the tremendously large dimensionality of the state variable is overcome by means of an iterative method which combines simulation and recursive estimation to compute successive approximations of the value function. The implementation of the Parameter Iteration Method is illustrated for the problem of optimal replacement policy for a multi-item Markovian system.

参数迭代法动态规划维数灾难近似最优策略

阅读原文 ↗