Scalable Estimation for Structured Additive Distributional Regression
提出一种基于随机梯度下降的回溯算法,能在普通笔记本电脑上处理海量数据,自动选择变量和平滑参数,在结构化加性分布回归中性能优于或等同于梯度提升,且计算时间更短。
Obtaining probabilistic models is of high relevance in many recent applications.However, estimation of such distributional models with very large datasets remains a difficult task.In particular, the use of rather complex models can easily lead to memory-related efficiency problems and thereby make estimation infeasible even on high-performance computers.We address these challenges and propose a novel backfitting algorithm, which is based on the ideas of stochastic gradient descent and can deal virtually with any amount of data on a conventional laptop.The algorithm performs automatic selection of variables and determination of smoothing parameters.Its performance is superior or at least equivalent to other implementations for structured additive distributional regression, such as, gradient boosting, while maintaining lower computation time.Performance is evaluated using an extensive simulation study and an exceptionally challenging example of lightning count prediction across Austria with over 9 million observations and 80 covariates.