An Improved Divide-and-Conquer Approach to Estimating Mean Functional, with Application to Average Treatment Effect Estimation
针对大数据下均值泛函估计的计算复杂性问题,提出使用全局最优带宽的分治方法,在降低计算量的同时达到与全样本估计相同的效率,并通过平均处理效应估计验证了方法的有效性。
Mean estimation is an important issue in statistical inference and machine learning. We are concerned with estimating mean functional that is a function of several nonparametric functions when there is a large amount of observations. Directly estimating such mean functional through nonparametric smoothing has the complexity of at least a quadratic order of the sample size, which is computationally prohibitive for massive data. The divide-and-conquer approach are thus readily used to alleviate the computational complexity issue, which however imposes a stringent condition on the sample size in each local machine if a locally optimal bandwidth is used. To address this issue, we suggest to use a globally optimal bandwidth in each local machine, which alleviates the restriction on the local sample sizes substantially. We show that the divide-and-conquer approach with a globally optimal bandwidth achieves the estimation efficiency bound as if all observations were pooled together. In terms of computational efficiency, our proposal outperforms the pooled algorithm dramatically. We demonstrate these properties through average treatment effect estimation from both the asymptotic and the non-asymptotic perspectives.