通过潜在排斥混合模型对高维数据进行贝叶斯聚类

Bayesian clustering of high-dimensional data via latent repulsive mixtures

Biometrika · 2024

被引 4

ABS 4

Lorenzo Ghilotti 通讯
Mario Beraha
Alessandra Guglielmi

中文导读

提出一种结合降维与聚类的贝叶斯模型，通过潜在得分上的排斥点过程先验提高对模型设定错误的稳健性，在模拟和植物物种共现数据上表现优于标准方法。

Abstract

Summary Model-based clustering of moderate- or large-dimensional data is notoriously difficult. We propose a model for simultaneous dimensionality reduction and clustering by assuming a mixture model for a set of latent scores, which are then linked to the observations via a Gaussian latent factor model. This approach was recently investigated by Chandra et al. (2023). The authors used a factor-analytic representation and assumed a mixture model for the latent factors. However, performance can deteriorate in the presence of model misspecification. Assuming a repulsive point process prior for the component-specific means of the mixture for the latent scores is shown to yield a more robust model that outperforms the standard mixture model for the latent factors in several simulated scenarios. The repulsive point process must be anisotropic to favour well-separated clusters of data, and its density should be tractable for efficient posterior inference. We address these issues by proposing a general construction for anisotropic determinantal point processes. We illustrate our model in simulations, as well as a plant species co-occurrence dataset.

聚类分析贝叶斯统计高维数据降维混合模型

阅读原文 ↗