Large-scale multiple testing: Fundamental limits of false discovery rate control and compound oracle
研究了在假设数量趋于无穷时,错误发现率与错误未发现率之间的最优权衡,发现复合决策规则优于可分离规则,并扩展到高概率控制FDP的情形。
The false discovery rate (FDR) and the false nondiscovery rate (FNR), defined as the expected false discovery proportion (FDP) and the false nondiscovery proportion (FNP), are the most popular benchmarks for multiple testing. Despite the theoretical and algorithmic advances in recent years, the optimal trade-off between the FDR and the FNR has been largely unknown, except for certain restricted classes of decision rules, for example, separable rules, or for other performance metrics, for example, the marginal FDR and the marginal FNR (mFDR and mFNR). In this paper we determine the asymptotically optimal FDR-FNR trade-off under the two-group random mixture model when the number of hypotheses tends to infinity. Distinct from the optimal mFDR-mFNR trade-off, which is achieved by separable decision rules, the optimal FDR-FNR trade-off requires compound rules, even in the large-sample limit and for models as simple as the Gaussian location model. This suboptimality of separable rules also holds for other objectives, such as maximizing the expected number of true discoveries. Finally, to address the limitation of the FDR, which only controls the expectation but not the fluctuation of the FDP, we also determine the optimal tradeoff when the FDP is controlled with high probability and show it coincides with that of the mFDR and the mFNR. Extensions to models with a fixed nonnull proportion are also obtained.