模拟偏斜和聚类加权数据以研究聚类算法性能

On Simulating Skewed and Cluster-Weighted Data for Studying Performance of Clustering Algorithms

Journal of Computational and Graphical Statistics · 2023

被引 3

ABS 3

Volodymyr Melnykov 通讯
Yan Wang
Yana Melnykov
Francesca Torti
Domenico Perrotta
Marco Riani

中文导读

本文扩展了混合成分间成对重叠的概念，提出了模拟具有指定重叠特征的偏斜聚类和聚类加权模型的方法，用于系统研究聚类算法的性能。

Abstract

In this article, extensions to the recently introduced concept of pairwise overlap between mixture components are proposed. The notion of overlap is useful for studying the systematic performance of clustering algorithms. Existing methods can be used for simulating elliptical data according to pre-specified overlap characteristics. First, an approach to simulating skewed clusters with a desired overlap is proposed. Next, an extension to measuring overlap in cluster-weighted models is considered. Thus, this article provides important extensions to the existing methods for simulating heterogeneous data for studying the systematic performance of clustering algorithms. Supplementary materials for this article are available online.

聚类分析数据挖掘算法性能模拟数据

阅读原文 ↗