大规模多重检验中利用辅助信息的最优错误发现率控制

Optimal false discovery rate control for large scale multiple testing with auxiliary information

Annals of Statistics · 2022

被引 31

ABS 4*

Hongyuan Cao
Jun Chen
Xianyang Zhang

中文导读

针对大规模多重检验问题，提出一种利用辅助信息（如假设间的结构关系）的两组混合模型框架，通过形状约束估计先验概率并设计最优拒绝规则，在控制平均错误发现率的同时最大化预期真阳性数，模拟和基因组数据表明其功效优于现有方法。

Abstract

Large-scale multiple testing is a fundamental problem in high dimensional statistical inference. It is increasingly common that various types of auxiliary information, reflecting the structural relationship among the hypotheses, are available. Exploiting such auxiliary information can boost statistical power. To this end, we propose a framework based on a two-group mixture model with varying probabilities of being null for different hypotheses a priori, where a shape-constrained relationship is imposed between the auxiliary information and the prior probabilities of being null. An optimal rejection rule is designed to maximize the expected number of true positives when average false discovery rate is controlled. Focusing on the ordered structure, we develop a robust EM algorithm to estimate the prior probabilities of being null and the distribution of p-values under the alternative hypothesis simultaneously. We show that the proposed method has better power than state-of-the-art competitors while controlling the false discovery rate, both empirically and theoretically. Extensive simulations demonstrate the advantage of the proposed method. Datasets from genome-wide association studies are used to illustrate the new methodology.

统计推断多重检验错误发现率高维数据分析基因组关联研究

免费全文 ↗阅读原文 ↗