High-Dimensional Covariate-Augmented Overdispersed Multi-Study Poisson Factor Model
提出一种针对多研究高维计数数据的因子模型,能提取研究共享和特有因子,处理异质噪声和过度分散,并给出高效参数估计和因子数选择方法,在单细胞测序数据中验证效果。
Factor analysis for high-dimensional data is a canonical problem in statistics and has a wide range of applications. However, there is currently no factor model tailored to effectively analyze high-dimensional count responses with corresponding covariates across multiple studies. In this paper, we introduce factor models designed to jointly analyze multiple studies by extracting study-shared and specified factors. Our factor models account for heterogeneous noises and overdispersion among counts with augmented covariates. We propose an efficient and speedy variational estimation procedure for estimating model parameters, along with a novel criterion for selecting the optimal number of factors and the rank of regression coefficient matrix. The consistency and asymptotic normality of estimators are systematically investigated by connecting variational likelihood and profile M-estimation. Extensive simulations and an analysis of a single-cell sequencing dataset are conducted to demonstrate the effectiveness of the proposed multi-study Poisson factor model.