Bias and Variance Approximation in Value Function Estimates
针对有限状态、有限动作、无限时域、折扣奖励的马尔可夫决策过程,研究模型参数经验估计导致的价值函数估计的偏差和方差,给出闭式近似以推导置信区间,并用邮购公司客户数据验证。
We consider a finite-state, finite-action, infinite-horizon, discounted reward Markov decision process and study the bias and variance in the value function estimates that result from empirical estimates of the model parameters. We provide closed-form approximations for the bias and variance, which can then be used to derive confidence intervals around the value function estimates. We illustrate and validate our findings using a large database describing the transaction and mailing histories for customers of a mail-order catalog firm.