交叉验证结果在预测中的准确性

The Accuracy of Cross‐Validation Results in Forecasting

DECISION SCIENCES · 1995
被引 5
人大 AABS 3

中文导读

通过分析一家大型保险公司的客户数据,研究了交叉验证在回归预测模型中的准确性,发现增加样本量能提高可靠性,但若总体模型差异小则验证结果不可靠,且样本量增加对可靠性影响不大。

Abstract

ABSTRACT The widespread use of regression analysis as a business forecasting tool and renewed interest in the use of cross‐validation to aid in regression model selection make it essential that decision makers fully understand methods of cross‐validation in forecasting, along with the advantages and limitations of such analysis. Only by fully understanding the process can managers accurately interpret the important implications of statistical cross‐validation results in their determination of the robustness of regression forecasting models. Through a multiple regression analysis of a large insurance company's customer database, the Herzberg equation for determining the criterion of validity [11] and analysis of samples of different size from the two regions covered by the database, we illustrate the use of statistical cross‐validation and test a set of factors hypothesized to be related to the statistical accuracy of validation. We find that increasing sample size will increase reliability. When the magnitude of population model differences is small, validation results are found to be unreliable, and increasing sample size has little or no effect on reliability. In addition, the relative fit of the model for the derivative sample and the validation sample has an impact on validation accuracy, and should be used as an indicator of when further analysis should be undertaken. Furthermore, we find that the probability distribution of the population independent variables has no effect on validation accuracy.

交叉验证回归分析预测样本量