On Consistent Entropy-Regularized k-Means Clustering With Feature Weight Learning: Algorithm and Statistical Analyses
针对高维数据聚类中特征重要性学习问题,提出一种熵正则化的k均值聚类算法,通过块坐标下降法高效求解,并利用VC理论证明其强一致性,适用于需要特征权重解释的聚类场景。
Clusters in real data are often restricted to low-dimensional subspaces rather than the entire feature space. Recent approaches to circumvent this difficulty are often computationally inefficient and lack theoretical justification in terms of their large-sample behavior. This article deals with the problem by introducing an entropy incentive term to efficiently learn the feature importance within the framework of center-based clustering. A scalable block-coordinate descent algorithm, with closed-form updates, is incorporated to minimize the proposed objective function. We establish theoretical guarantees on our method by Vapnik-Chervonenkis (VC) theory to establish strong consistency along with uniform concentration bounds. The merits of our method are showcased through detailed experimental analysis on toy examples as well as real data clustering benchmarks.