数据窥探的现实检验

A Reality Check for Data Snooping

Econometrica · 2000
被引 1798 · 同刊同年前 6%
人大 A+FT50ABS 4*

中文导读

提出一种检验方法,用于判断在模型搜索中找到的最佳模型是否比给定基准模型有预测优势,帮助研究者避免将偶然结果误认为真正的好结果。

Abstract

Data snooping occurs when a given set of data is used more than once for purposes of inference or model selection. When such data reuse occurs, there is always the possibility that any satisfactory results obtained may simply be due to chance rather than to any merit inherent in the method yielding the results. This problem is practically unavoidable in the analysis of time-series data, as typically only a single history measuring a given phenomenon of interest is available for analysis. It is widely acknowledged by empirical researchers that data snooping is a dangerous practice to be avoided, but in fact it is endemic. The main problem has been a lack of sufficiently simple practical methods capable of assessing the potential dangers of data snooping in a given situation. Our purpose here is to provide such methods by specifying a straightforward procedure for testing the null hypothesis that the best model encountered in a specification search has no predictive superiority over a given benchmark model. This permits data snooping to be undertaken with some degree of confidence that one will not mistake results that could have been generated by chance for genuinely good results.

数据窥探模型选择预测能力检验基准模型