张量块模型中的精确聚类：统计最优性与计算极限

Exact Clustering in Tensor Block Model: Statistical Optimality and Computational Limit

Journal of the Royal Statistical Society. Series B: Statistical Methodology · 2022

被引 33

ABS 4

Rungang Han
Yuetian Luo
Miaoyan Wang
Anru R. Zhang 通讯

中文导读

提出张量块模型及两种高效聚类算法（高阶Lloyd算法和高阶谱聚类），在亚高斯噪声下证明其收敛性和统计最优性，并刻画了高斯模型下精确聚类的统计-计算权衡。

Abstract

Abstract High-order clustering aims to identify heterogeneous substructures in multiway datasets that arise commonly in neuroimaging, genomics, social network studies, etc. The non-convex and discontinuous nature of this problem pose significant challenges in both statistics and computation. In this paper, we propose a tensor block model and the computationally efficient methods, high-order Lloyd algorithm (HLloyd), and high-order spectral clustering (HSC), for high-order clustering. The convergence guarantees and statistical optimality are established for the proposed procedure under a mild sub-Gaussian noise assumption. Under the Gaussian tensor block model, we completely characterise the statistical-computational trade-off for achieving high-order exact clustering based on three different signal-to-noise ratio regimes. The analysis relies on new techniques of high-order spectral perturbation analysis and a ‘singular-value-gap-free’ error bound in tensor estimation, which are substantially different from the matrix spectral analyses in the literature. Finally, we show the merits of the proposed procedures via extensive experiments on both synthetic and real datasets.

聚类分析张量模型高维统计谱聚类计算复杂度

阅读原文 ↗