一致熵正则化k均值聚类与特征权重学习：算法与统计分析

On Consistent Entropy-Regularized k-Means Clustering With Feature Weight Learning: Algorithm and Statistical Analyses

IEEE Transactions on Cybernetics · 2022

被引 9

ABS 3

Saptarshi Chakraborty
Debolina Paul
Swagatam Das

中文导读

针对高维数据聚类中特征重要性学习问题，提出一种熵正则化的k均值聚类算法，通过块坐标下降法高效求解，并利用VC理论证明其强一致性，适用于需要特征权重解释的聚类场景。

Abstract

Clusters in real data are often restricted to low-dimensional subspaces rather than the entire feature space. Recent approaches to circumvent this difficulty are often computationally inefficient and lack theoretical justification in terms of their large-sample behavior. This article deals with the problem by introducing an entropy incentive term to efficiently learn the feature importance within the framework of center-based clustering. A scalable block-coordinate descent algorithm, with closed-form updates, is incorporated to minimize the proposed objective function. We establish theoretical guarantees on our method by Vapnik-Chervonenkis (VC) theory to establish strong consistency along with uniform concentration bounds. The merits of our method are showcased through detailed experimental analysis on toy examples as well as real data clustering benchmarks.

聚类分析高维数据特征选择机器学习统计学习

阅读原文 ↗