用交叉验证自欺欺人:单样本设计

FOOLING YOURSELF WITH CROSS‐VALIDATION: SINGLE SAMPLE DESIGNS

PERSONNEL PSYCHOLOGY · 1983
被引 36
人大 AABS 4*

中文导读

研究发现,最常用的单样本交叉验证设计(将单个样本分为训练集和验证集)虽然能调整随机抽样误差,但对抽样假设的违反不敏感,且在小样本蒙特卡洛研究中显示,在非代表性样本中得出的无效结果在交叉验证中仍能成立,因此单样本交叉验证并不比公式估计更有优势。

Abstract

The most commonly used cross‐validation design involves drawing a single sample and partitioning that sample into derivation and holdout subsamples. This type of design allows one to adjust for random sampling error, but like formula estimates of cross‐validity, is insensitive to violations of sampling assumptions. As is shown in a small Monte Carlo study, results obtained in non‐representative samples, which are known to be invalid in the population, will nonetheless hold up well under cross‐validation when single‐sample designs are employed. It is suggested that single‐sample cross‐validation estimates possess no clear‐cut advantages over formula estimates, and thus are not worth the effort or the loss of degrees of freedom.

交叉验证统计方法样本设计蒙特卡洛方法