Estimation of the Odds Ratio in the Two-Armed Bandit Problem
针对两个伯努利总体的对数优势比,提出了固定宽度区间估计和点估计的渐近最优序贯方法,允许两总体观测成本不同且可能依赖于成功概率。通过模拟研究了两种成本结构下的区间估计,并给出了自适应抽样相对于成对抽样的成本节省近似表达式。
Asymptotically optimal sequential procedures are proposed for fixed width interval and for point estimation of the log odds ratio for two Bernoulli populations. The costs of observations can be different for the two populations and possibly dependent on the success probabilities. The interval estimation procedure is studied by simulation for two sampling cost structures of particular interest, namely when the goal is to minimize the total average sample size, and when the goal is to minimize the total expected number of failures before termination. Approximate expressions given for the savings in sampling cost using adaptive rather than pairwise sampling show that such savings can be substantial in some cases. In the final section, the multiarmed bandit problem is considered.