How to Sample and When to Stop Sampling: The Generalised Wald Problem and Minimax Policies
研究了在抽样有成本时,决策者如何通过自适应分配和停止规则来最大化期望福利,并给出了极小化极大最优策略,该策略与抽样成本无关且不依赖观测结果。
Abstract We study sequential experiments where sampling is costly and a decision-maker aims to determine the best treatment for full-scale implementation by (1) adaptively allocating units between two possible treatments, and (2) stopping the experiment when the expected welfare (inclusive of sampling costs) from implementing the chosen treatment is maximised. Working under a continuous time limit, we characterise the optimal policies under the minimax regret criterion. We show that the same policies also remain optimal under both parametric and non-parametric outcome distributions in an asymptotic regime where sampling costs approach zero. The minimax optimal sampling rule is just the Neyman allocation: it is independent of sampling costs and does not adapt to observed outcomes. The decision-maker halts sampling when the product of the average treatment difference and the number of observations surpasses a specific threshold. The results derived also apply to the so-called best-arm identification problem, where the number of observations is exogenously specified.