交叉验证方法选择中的成本效益考量

COST‐BENEFIT CONSIDERATIONS IN CHOOSING AMONG CROSS‐VALIDATION METHODS

PERSONNEL PSYCHOLOGY · 1984

被引 18

人大 AABS 4*

Kevin R. Murphy · 纽约大学通讯

中文导读

比较了经验估计和公式估计两种交叉验证方法的成本与效益，发现经验方法通常不值得其成本，除非蒙特卡洛抽样假设不满足；多样本设计优于单样本设计，而多方程设计虽更准确但估计参数有误。

Abstract

There are two general methods of cross‐validation: (a) empirical estimation, and (b) formula estimation. In choosing a specific cross‐validation procedure, one should consider both costs (eg. inefficient use of available data in estimating regression parameters) and benefits (eg. accuracy in estimating population cross‐validity). Empirical cross‐validation methods involve significant costs, since they are typically laborious and wasteful of data, but under conditions represented in Monte Carlo studies, they are generally not more accurate than formula estimates. Consideration of costs and benefits suggests that empirical estimation methods are typically not worth the cost, except in a limited number of cases in which Monte Carlo sampling assumptions are not met in the derivation sample. Designs which use multiple samples to estimate the cross‐validity of a single regression equation are clearly preferable to single‐sample designs; the latter are never expected to be more accurate than formula estimates and thus are never worth the cost. Multi‐equation designs are more accurate than single equation designs, but they appear to estimate the wrong parameter, and thus are difficult to interpret.

计量经济学统计学交叉验证回归分析蒙特卡洛方法

阅读原文 ↗