Forecasting bilateral asylum seeker flows with high-dimensional data and machine learning techniques
研究使用机器学习与谷歌趋势等高维数据,为157个来源国到欧盟27国的月度寻求庇护者流动建立预测模型,发现结合随机森林和极端梯度提升的集成预测在3至12个月预测期内优于随机游走模型。
Abstract We develop monthly asylum seeker flow forecasting models for 157 origin countries to the EU27, using machine learning and high-dimensional data, including digital trace data from Google Trends. Comparing different models and forecasting horizons and validating out-of-sample, we find that an ensemble forecast combining Random Forest and Extreme Gradient Boosting algorithms outperforms the random walk over horizons between 3 and 12 months. For large corridors, this holds in a parsimonious model exclusively based on Google Trends variables, which has the advantage of near real-time availability. We provide practical recommendations how our approach can enable ahead-of-period asylum seeker flow forecasting applications.