价值函数估计中的偏差与方差近似

Bias and Variance Approximation in Value Function Estimates

Management Science · 2007

被引 154

人大 A+FT50UTD24ABS 4*

Shie Mannor · 麦吉尔大学
Duncan Simester · 麻省理工学院
Peng Sun · 杜克大学
John N. Tsitsiklis · 麻省理工学院

中文导读

针对有限状态、有限动作、无限时域、折扣奖励的马尔可夫决策过程，研究模型参数经验估计导致的价值函数估计的偏差和方差，给出闭式近似以推导置信区间，并用邮购公司客户数据验证。

Abstract

We consider a finite-state, finite-action, infinite-horizon, discounted reward Markov decision process and study the bias and variance in the value function estimates that result from empirical estimates of the model parameters. We provide closed-form approximations for the bias and variance, which can then be used to derive confidence intervals around the value function estimates. We illustrate and validate our findings using a large database describing the transaction and mailing histories for customers of a mail-order catalog firm.

偏差近似方差近似值函数估计马尔可夫决策过程

阅读原文 ↗