价值函数估计中的偏差与方差近似

Bias and Variance Approximation in Value Function Estimates

Management Science · 2007
被引 154
人大 A+FT50UTD24ABS 4*

中文导读

针对有限状态、有限动作、无限时域、折扣奖励的马尔可夫决策过程,研究模型参数经验估计导致的价值函数估计的偏差和方差,给出闭式近似以推导置信区间,并用邮购公司客户数据验证。

Abstract

We consider a finite-state, finite-action, infinite-horizon, discounted reward Markov decision process and study the bias and variance in the value function estimates that result from empirical estimates of the model parameters. We provide closed-form approximations for the bias and variance, which can then be used to derive confidence intervals around the value function estimates. We illustrate and validate our findings using a large database describing the transaction and mailing histories for customers of a mail-order catalog firm.

偏差近似方差近似值函数估计马尔可夫决策过程