🌙

通过子空间因子分析从多个数据源推断协方差结构

Inferring Covariance Structure from Multiple Data Sources via Subspace Factor Analysis

Journal of the American Statistical Association · 2024
被引 6
ABS 4

中文导读

提出子空间因子分析模型,从多个数据源中识别共享和条件特定的协方差成分,解决可识别性问题,并开发高效贝叶斯计算算法,适用于整合免疫学基因表达数据集。

Abstract

Factor analysis provides a canonical framework for imposing lower-dimensional structure such as sparse covariance in high-dimensional data. High-dimensional data on the same set of variables are often collected under different conditions, for instance in reproducing studies across research groups. In such cases, it is natural to seek to learn the shared versus condition-specific structure. Existing hierarchical extensions of factor analysis have been proposed, but face practical issues including identifiability problems. To address these shortcomings, we propose a class of SUbspace Factor Analysis (SUFA) models, which characterize variation across groups at the level of a lower-dimensional subspace. We prove that the proposed class of SUFA models lead to identifiability of the shared versus group-specific components of the covariance, and study their posterior contraction properties. Taking a Bayesian approach, these contributions are developed alongside efficient posterior computation algorithms. Our sampler fully integrates out latent variables, is easily parallelizable and has complexity that does not depend on sample size. We illustrate the methods through application to integration of multiple gene expression datasets relevant to immunology.

因子分析高维数据贝叶斯方法基因表达数据协方差结构