🌙

基于子抽样的序贯一步估计法及其在大型数据集客户流失分析中的应用

Sequential One-step Estimator by Sub-sampling for Customer Churn Analysis with Massive Data sets

Journal of the Royal Statistical Society. Series C: Applied Statistics · 2022
被引 5
ABS 3

中文导读

提出一种序贯一步估计法,通过重复子抽样逐步更新估计值并取平均,降低计算负担,在证券公司的真实大数据客户流失分析中验证了其可解释性和预测能力。

Abstract

Abstract Customer churn is one of the most important concerns for large companies. Currently, massive data are often encountered in customer churn analysis, which bring new challenges for model computation. To cope with these concerns, sub-sampling methods are often used to accomplish data analysis tasks of large scale. To cover more informative samples in one sampling round, classic sub-sampling methods need to compute sampling probabilities for all data points. However, this method creates a huge computational burden for data sets of large scale and therefore, is not applicable in practice. In this study, we propose a sequential one-step (SOS) estimation method based on repeated sub-sampling data sets. In the SOS method, data points need to be sampled only with probabilities, and the sampling step is conducted repeatedly. In each sampling step, a new estimate is computed via one-step updating based on the newly sampled data points. This leads to a sequence of estimates, of which the final SOS estimate is their average. We theoretically show that both the bias and the standard error of the SOS estimator can decrease with increasing sub-sampling sizes or sub-sampling times. The finite sample SOS performances are assessed through simulations. Finally, we apply this SOS method to analyse a real large-scale customer churn data set in a securities company. The results show that the SOS method has good interpretability and prediction power in this real application.

客户流失分析子抽样方法大规模数据序贯估计机器学习