A Flexible Zero-Inflated Poisson-Gamma Model with Application to Microbiome Sequence Count Data
针对微生物组数据中观测丰度与真实丰度的偏差、零膨胀和过度离散问题,提出零膨胀泊松-伽马模型,将均值参数分解为真实丰度和采样变异,并建立统计推断方法,帮助研究者识别差异丰度和差异变异。
In microbiome studies, it is of interest to use a sample from a population of microbes, such as the gut microbiota community, to estimate the population proportion of these taxa. However, due to biases introduced in sampling and preprocessing steps, these observed taxa abundances may not reflect true taxa abundance patterns in the ecosystem. Repeated measures, including longitudinal study designs, may be potential solutions to mitigate the discrepancy between observed abundances and true underlying abundances. Yet, widely observed zero-inflation and over-dispersion issues can distort downstream statistical analyses aiming to associate taxa abundances with covariates of interest. To this end, we propose a Zero-Inflated Poisson Gamma (ZIPG) model framework to address these aforementioned challenges. From a perspective of measurement errors, we accommodate the discrepancy between observations and truths by decomposing the mean parameter in Poisson regression into a true abundance level and a multiplicative measurement of sampling variability from the microbial ecosystem. Then, we provide a flexible ZIPG model framework by connecting both the mean abundance and the variability of abundances to different covariates, and build valid statistical inference procedures for both parameter estimation and hypothesis testing. Through comprehensive simulation studies and real data applications, the proposed ZIPG method provides significant insights into distinguished differential variability and mean abundance. Supplementary materials for this article are available online.