存在未测量混杂因素时广义线性模型的联合推断

Simultaneous Inference for Generalized Linear Models with Unmeasured Confounders

Journal of the American Statistical Association · 2025

被引 3 · 同刊同年前 8%

ABS 4

Jin‐Hong Du
Larry Wasserman
Kathryn Roeder 通讯

中文导读

针对基因组研究中因未测量混杂因素导致标准方法失效的问题，提出一种统一框架，通过正交结构、线性投影和偏差校正，实现广义线性模型的大规模假设检验，有效控制错误发现率。

Abstract

Tens of thousands of simultaneous hypothesis tests are routinely performed in genomic studies to identify differentially expressed genes. However, due to unmeasured confounders, many standard statistical approaches may be substantially biased. This article investigates the large-scale hypothesis testing problem for multivariate generalized linear models in the presence of confounding effects. Under arbitrary confounding mechanisms, we propose a unified statistical estimation and inference framework that harnesses orthogonal structures and integrates linear projections into three key stages. It begins by disentangling marginal and uncorrelated confounding effects to recover the latent coefficients. Subsequently, latent factors and primary effects are jointly estimated through lasso-type optimization. Finally, we incorporate projected and weighted bias-correction steps for hypothesis testing. Theoretically, we establish the identification conditions of various effects and non-asymptotic error bounds. We show effective Type-I error control of asymptotic z-tests as sample and response sizes approach infinity. Numerical experiments demonstrate that the proposed method controls the false discovery rate by the Benjamini-Hochberg procedure and is more powerful than alternative methods. By comparing single-cell RNA-seq counts from two groups of samples, we demonstrate the suitability of adjusting confounding effects when significant covariates are absent from the model. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.

基因组学假设检验因果推断高维统计

免费全文 ↗阅读原文 ↗