Optimal eigenvalue shrinkage in the semicircle limit
研究了样本量与维度高度不平衡时协方差矩阵的最优估计,推导出15种损失函数下的闭式最优收缩规则,并证明这些规则在比例增长和非比例增长框架下均渐近最优。
In response to the increasing dimensionality of modern datasets, recent theoretical studies of covariance estimation frequently adopt the proportional-growth asymptotic framework in which the sample size n and dimension p are comparable, with n,p→∞ and γn:=p/n→γ>0. However, in many datasets—perhaps most—the sample size and dimension are highly imbalanced. To address this, we consider the disproportional-growth asymptotic framework, where n,p→∞ and γn→0 or γn→∞. These regimes give rise to novel behavior distinct from those in the proportional-growth and classical fixed-p settings. We work under the spiked covariance model in which the theoretical covariance matrix is a low-rank perturbation of the identity. For each of 15 loss functions, we derive closed-form optimal shrinkage and thresholding rules; for several losses, optimality takes the particularly strong form of unique asymptotic admissibility. These optimal procedures involve substantial eigenvalue shrinkage and yield significant improvements over the standard empirical covariance estimator. Practitioners may ask whether their data is better modeled under the proportional or disproportional frameworks and which of the corresponding procedures to apply. Fortunately, it is possible to remain framework-agnostic: a single, unified set of shrinkage rules—depending only on the aspect ratio γn of the given data—achieves asymptotic optimality in both regimes. At the heart of these phenomena is the spiked Wigner model in which a low-rank matrix is perturbed by symmetric noise. Under both the (scaled) spiked covariance model as γn→0 and the spiked Wigner model, the empirical spectral distributions converge to the semicircle law. Exploiting this connection, we derive optimal spiked-Wigner shrinkage rules, which are of independent and fundamental interest.