有限集成惩罚估计量的修正广义交叉验证

Corrected generalized cross-validation for finite ensembles of penalized estimators

Journal of the Royal Statistical Society. Series B: Statistical Methodology · 2024

被引 2

ABS 4

Pierre Bellec
Jin‐Hong Du
Takuya Koriyama
Pratik Patil
Kai Tan

中文导读

研究发现广义交叉验证（GCV）对大于1的有限集成估计量不一致，并提出一种修正方法（CGCV），通过加性标量校正保持计算优势，无需样本分割或重拟合，适用于惩罚最小二乘估计量的集成。

Abstract

Abstract Generalized cross-validation (GCV) is a widely used method for estimating the squared out-of-sample prediction risk that employs scalar degrees of freedom adjustment (in a multiplicative sense) to the squared training error. In this paper, we examine the consistency of GCV for estimating the prediction risk of arbitrary ensembles of penalized least-squares estimators. We show that GCV is inconsistent for any finite ensemble of size greater than one. Towards repairing this shortcoming, we identify a correction that involves an additional scalar correction (in an additive sense) based on degrees of freedom adjusted training errors from each ensemble component. The proposed estimator (termed CGCV) maintains the computational advantages of GCV and requires neither sample splitting, model refitting, or out-of-bag risk estimation. The estimator stems from a finer inspection of the ensemble risk decomposition and two intermediate risk estimators for the components in this decomposition. We provide a non-asymptotic analysis of the CGCV and the two intermediate risk estimators for ensembles of convex penalized estimators under Gaussian features and a linear response model. Furthermore, in the special case of ridge regression, we extend the analysis to general feature and response distributions using random matrix theory, which establishes model-free uniform consistency of CGCV.

计量经济学统计学习高维统计模型选择

阅读原文 ↗