一种探索的启发式方法：完美信息的价值

A Heuristic Approach to Explore: The Value of Perfect Information

Management Science · 2023

被引 6

人大 A+FT50UTD24ABS 4*

Shervin Shahrokhi Tehrani · 得克萨斯大学达拉斯分校
Andrew T. Ching · 约翰霍普金斯大学

中文导读

提出了一种名为短视完美信息价值（myopic-VPI）的启发式决策模型，用于解决多臂赌博机问题。该模型仅需排序和单维积分，计算效率高，在模拟和实证中表现良好，适合研究人员和从业者使用。

Abstract

This research introduces a new heuristic decision model called myopic-value of perfect information (VPI) to study multiarmed bandit (MAB) problems. The myopic-VPI approach only involves ranking the alternatives and computing a one-dimensional integration to obtain the expected future value of exploration. Because myopic-VPI is intuitive and does not involve solving a dynamic programming problem, it has the potential to serve as a useful heuristic approach to model exploration-exploitation tradeoffs. We conduct a series of simulation experiments to study its performance relative to other heuristics under a wide range of parameterizations. We find that myopic-VPI provides significant savings in computational time and decent performance in accumulated utility (although not the strongest) relative to other forward-looking heuristics; this suggests that it is a useful “fast-and-frugal” heuristic. Furthermore, our simulation experiments also reveal the conditions under which myopic-VPI outperforms and underperforms compared with other heuristics. Its empirical performance in the diaper category further shows that myopic-VPI can save estimation time significantly and fit the data on par with index and near-optimal, providing encouraging news that myopic-VPI could be added to the researcher’s or practitioner’s toolkit for MAB problems. This paper was accepted by Gui Liberali, marketing. Supplemental Material: The online appendices are available at https://doi.org/10.1287/mnsc.2019.00578 .

近视完美信息价值多臂赌博机启发式决策探索-利用权衡

阅读原文 ↗