Exploration versus exploitation: A laboratory test of the single‐agent exponential bandit model
通过实验室实验检验单智能体指数赌博机模型,发现被试对先验信念、安全行动和折现因子的变化反应方向正确,但反应不足且探索少于预测,原因并非风险厌恶或随机终止概率,而是错误信念。
Abstract This paper analyzes how individuals resolve an exploration versus exploitation trade‐off in a laboratory experiment. The experiment implements the single‐agent exponential bandit model. We analyze how subjects respond to changes in the prior belief, safe action, and discount factor. We find that subjects respond in the predicted direction to these changes. However, we find that subjects under‐respond to the prior belief, under‐respond to the safe action, and typically explore less than predicted. Our results suggest that neither risk aversion nor the random termination probability are driving under‐experimentation. Our results are consistent with subjects having incorrect beliefs about exploration.