Cross-Validation in Nonparametric Estimation of Probabilities and Probability Densities
研究了交叉验证在估计单元概率时的大样本性质,推导了损失函数使估计量一致或最小化期望损失的充要条件,并提供了生成最优损失函数的简单方法。
We examine large-sample properties of cross-validation for estimating cell probabilities, starting from a completely general measure of loss. Necessary and sufficient conditions on the loss function are derived for the resulting estimator to be consistent, or to minimize expected loss. These results reveal that cross-validation is extremely sensitive to the shape of the loss function. Nevertheless, when the loss function is chosen correctly, cross-validation can be relied on to perform well for large samples. We provide a simple method of generating loss functions with optimal properties. Extension to the estimation of univariate probability density functions is discussed at a heuristic level.