Model Averaging for Prediction With Fragmentary Data
针对多源数据中许多样本协变量缺失的碎片数据,提出基于频率学派模型平均的预测方法,通过留一交叉验证选择权重,并用中国财富管理社区的个人收入数据验证效果。
One main challenge for statistical prediction with data from multiple sources is that not all the associated covariate data are available for many sampled subjects. Consequently, we need new statistical methodology to handle this type of “fragmentary data” that has become more and more popular in recent years. In this article, we propose a novel method based on the frequentist model averaging that fits some candidate models using all available covariate data. The weights in model averaging are selected by delete-one cross-validation based on the data from complete cases. The optimality of the selected weights is rigorously proved under some conditions. The finite sample performance of the proposed method is confirmed by simulation studies. An example for personal income prediction based on real data from a leading e-community of wealth management in China is also presented for illustration.