Are missing values important for earnings forecasts? A machine learning perspective
用机器学习方法插补分析师预测中的缺失值,发现插补后预测误差比均值预测降低41%,并提出一种新的矩阵分解方法进一步减少19%的误差。
Analysts' forecast is one of the most common and important estimators for firms' future earnings. However, it is challenging to fully utilize because of the missing values. This study applies machine learning techniques to impute missing values in individual analysts' forecasts and subsequently to predict firms' future earnings based on both imputed and observed forecasts. After imputing missing values, the forecast error is reduced by 41% compared to the mean forecast, suggesting that missing values after imputation indeed useful for earnings forecast. We analyze multiple imputation methods and show that the out-performance of matrix factorization (MF) is consistent using different evaluation measures and across firms. Finally, we propose a stochastic gradient descent based coupled matrix factorization (CMF) to augment the imputation quality of missing values with multiple datasets. CMF further reduces the error of earnings forecast by 19% compared to the MF with a single dataset.