Finding distributions that differ, with false discovery rate control
本文提出一种无分布假设的方法,用于比较一个参考分布与多个其他分布,在控制错误发现率的前提下找出哪些群体的分布与参考不同,并通过模拟和实际数据验证了其有效性。
Summary We consider the problem of comparing a reference distribution with several other distributions. Given a sample from both the reference and the comparison groups, we aim to identify the comparison groups whose distributions differ from that of the reference group. Viewing this as a multiple-testing problem, we introduce a methodology that provides exact, distribution-free control of the false discovery rate. To do so, we introduce the concept of batch conformal p -values and demonstrate that they satisfy positive regression dependence across the groups Benjamini & Yekutieli (2001), thereby enabling control of the false discovery rate through the Benjamini–Hochberg procedure. The proof of positive regression dependence introduces a novel technique for the inductive construction of rank vectors with almost-sure dominance under exchangeability. We evaluate the performance of the proposed procedure through simulations. Despite being distribution-free, in some cases it shows performance comparable to methods with knowledge of the data-generating normal distribution, and it further has more power than direct approaches based on conformal out-of-distribution detection. Furthermore, we illustrate our methods on a hepatitis C treatment dataset, where they identify patient groups with large treatment effects, and on the Current Population Survey dataset, where they identify subpopulations with long working hours.