Distributed estimation and inference for semiparametric binary response models
针对大规模数据下半参数二元选择模型的最大得分估计,提出平滑目标函数后的单次分治估计和迭代平滑的多轮估计,以消除对机器数量的限制并实现超线性优化误差改进。
The development of modern technology has enabled data collection of unprecedented size, which poses new challenges to many statistical estimation and inference problems. This paper studies the maximum score estimator of a semiparametric binary choice model under a distributed computing environment without prespecifying the noise distribution. An intuitive divide-and-conquer estimator is computationally expensive and restricted by a nonregular constraint on the number of machines, due to the highly nonsmooth nature of the objective function. We propose (1) a one-shot divide-and-conquer estimator after smoothing the objective to relax the constraint, and (2) a multiround estimator to completely remove the constraint via iterative smoothing. We specify an adaptive choice of kernel smoother with a sequentially shrinking bandwidth to achieve the superlinear improvement of the optimization error over multiple iterations. The improved statistical accuracy per iteration is derived, and a quadratic convergence up to the optimal statistical error rate is established. We further provide two generalizations to handle the heterogeneity of data sets and high-dimensional problems where the parameter of interest is sparse.