基于p中位数的约束聚类方法

A p-median based approach to constrained clustering

Annals of Operations Research · 2026

被引 0 · 同刊同年前 10%

ABS 3

Zhifeng Guo
Jesse R. O’Hanley 通讯
Stuart Gibson
Maria Paola Scaparra

中文导读

研究了将p中位数模型与实例级约束（必须连接和不能连接）结合进行约束聚类，提出拉格朗日松弛算法和变量约简技术，在基准和家庭用电数据集上高效获得高质量解，且相比k均值聚类，约束p中位数聚类在能耗变异性、簇大小分布和收入分层方面表现更优。

Abstract

Abstract Constrained clustering is a semi-supervised method for dividing multi-attribute datasets into groups of similar elements that incorporates restrictions on how data points should or should not be clustered together. In this study, we investigate the application of the p-median model coupled with instance-level constraints, specifically must-link and cannot-link constraint, for performing constrained clustering. A Lagrangian relaxation procedure is proposed to efficiently solve large problem instances. To further speed up solution times, a simple but highly effective variable reduction technique is devised for identifying a small set of potential cluster centers. We test our modeling framework and solution approach on a set of benchmark datasets and several real-world household electricity consumption datasets. We find that high-quality solutions with small optimality gaps can be obtained with moderate computational effort. In addition, analysis of the largest household electricity consumption dataset reveals that constrained p-median clusters have less extreme variability with respect to average energy usage, a more even spread of cluster sizes, and much less overlap between high- and low-income households within the same clusters compared to classic k-means clustering.

约束聚类p中位数模型拉格朗日松弛变量约简家庭用电数据

阅读原文 ↗