过拟合与过度自信预测的集成

Ensembles of Overfit and Overconfident Forecasts

Management Science · 2016
被引 74
人大 A+FT50UTD24ABS 4*

中文导读

研究了过拟合和过度自信预测在集成时对平均预测校准性的影响,提出使用修剪平均法改善校准,并在随机森林中验证了该方法能显著提升预测准确性。

Abstract

Firms today average forecasts collected from multiple experts and models. Because of cognitive biases, strategic incentives, or the structure of machine-learning algorithms, these forecasts are often overfit to sample data and are overconfident. Little is known about the challenges associated with aggregating such forecasts. We introduce a theoretical model to examine the combined effect of overfitting and overconfidence on the average forecast. Their combined effect is that the mean and median probability forecasts are poorly calibrated with hit rates of their prediction intervals too high and too low, respectively. Consequently, we prescribe the use of a trimmed average, or trimmed opinion pool, to achieve better calibration. We identify the random forest, a leading machine-learning algorithm that pools hundreds of overfit and overconfident regression trees, as an ideal environment for trimming probabilities. Using several known data sets, we demonstrate that trimmed ensembles can significantly improve the random forest’s predictive accuracy. This paper was accepted by James Smith, decision analysis.

过度拟合过度自信修剪平均随机森林