Asymptotic optimality of generalized C, cross-validation, and generalized cross-validation in regression with heteroskedastic errors
研究在异方差误差下,如何用数据驱动方法从线性估计类中选择最优估计,分析广义C、交叉验证和广义交叉验证三种准则的渐近最优性,发现交叉验证在一般条件下最优。
The problem considered here is that of using a data-driven procedure to select a good estimate from a class of linear estimates indexed by a discrete parameter. In contrast to other papers on this subject, we consider models with heteroskedastic errors. The results apply to model selection problems in linear regression and to nonparametric regression estimation via series estimators, nearest-neighbor estimators, and local regression estimators, among others. Generalized CL (GCL), cross-validation (CV), and generalized cross-validation (GCV) procedures are analyzed. The GCL and CV criteria are shown to be asymptotically optimal under general conditions. A feasible version of GCL, however, is available only in some applications. The GCV criterion is found to be asymptotically optimal only under a condition that is satisfied in some applications but not in others. For example, it is satisfied in the nearest-neighbor estimation context but not in the series estimation, local regression estimation, or model selection contexts. Thus, the CV criterion is the only feasible criterion of the three that is asymptotically optimal under general conditions. The proofs rely heavily on results of Li (1987).