Renewable Quantile Regression with Heterogeneous Streaming Datasets
针对流式数据中数据批次可能不同分布的问题,提出一种在线可再生分位数回归方法,仅需当前数据和历史数据摘要,能自动检测未知同质结构,对重尾噪声和异常值稳健,且计算效率高。
The renewable statistical inference has received much attention since the advent of streaming data collection techniques. However, most existing online updating methods are developed based on a homogeneity assumption and gradients; all data batches are required to be either independent and identically distributed or share the same regression parameters, and objective functions must be smooth concerning parameters. To our best knowledge, the only existing approach that allows some regression parameters to be different for different data batches, was proposed by Luo and Song who required the homogeneous structure to be known, which is difficult to guarantee in actual application. In this article, we develop an online renewable quantile regression method that relies only on the current data and summary statistics of historical data, for both homogeneous and heterogeneous streaming data. The proposed methods are computationally efficient, can automatically detect the unknown potential homogeneous structure, and are robust to heavy-tailed noise and data with outliers. Asymptotic properties show that the proposed renewable estimators can achieve the same statistical efficiency as the oracle estimators based on individual-level data. A numerical simulation and a real data analysis illustrate that the proposed methods perform well. Supplementary materials for this article are available online.