张量主成分分析及相关问题中的统计计算权衡：基于通信复杂度的方法

Statistical-computational trade-offs in tensor PCA and related problems via communication complexity

Annals of Statistics · 2024

被引 7

ABS 4★

Rishabh Dudeja
Daniel Hsu

中文导读

本文利用通信复杂度推导了张量主成分分析中内存受限算法的运行时间下界，揭示了样本量、数据遍历次数与内存需求之间的权衡，并解释了为何常用算法在样本不足时需要更多迭代。

Abstract

Tensor PCA is a stylized statistical inference problem introduced by Montanari and Richard to study the computational difficulty of estimating an unknown parameter from higher-order moment tensors. Unlike its matrix counterpart, Tensor PCA exhibits a statistical-computational gap, that is, a sample size regime where the problem is information-theoretically solvable but conjectured to be computationally hard. This paper derives computational lower bounds on the run-time of memory bounded algorithms for Tensor PCA using communication complexity. These lower bounds specify a trade-off among the number of passes through the data sample, the sample size and the memory required by any algorithm that successfully solves Tensor PCA. While the lower bounds do not rule out polynomial-time algorithms, they do imply that many commonly-used algorithms, such as gradient descent and power method, must have a higher iteration count when the sample size is not large enough. Similar lower bounds are obtained for non-Gaussian component analysis, a family of statistical estimation problems in which low-order moment tensors carry no information about the unknown parameter. Finally, stronger lower bounds are obtained for an asymmetric variant of Tensor PCA and related statistical estimation problems. These results explain why many estimators for these problems use a memory state that is significantly larger than the effective dimensionality of the parameter of interest.

张量主成分分析统计计算权衡通信复杂度计算下界非高斯成分分析

阅读原文 ↗