含异常值大数据集的稀疏回归

Sparse regression for large data sets with outliers

European Journal of Operational Research · 2021

被引 34

ABS 4

Lea Bottmer
Christophe Croux
Ines Wilms 通讯

中文导读

提出一种名为“稀疏射击S”的算法，能处理预测变量多于观测值且含异常值的大数据集，通过稀疏回归系数自动选择重要变量，模拟和汽车油耗数据验证了其鲁棒性和有效性。

Abstract

The linear regression model remains an important workhorse for data scientists. However, many data sets contain many more predictors than observations. Besides, outliers, or anomalies, frequently occur. This paper proposes an algorithm for regression analysis that addresses these features typical for big data sets, which we call “sparse shooting S”. The resulting regression coefficients are sparse, meaning that many of them are set to zero, hereby selecting the most relevant predictors. A distinct feature of the method is its robustness with respect to outliers in the cells of the data matrix. The excellent performance of this robust variable selection and prediction method is shown in a simulation study. A real data application on car fuel consumption demonstrates its usefulness.

回归分析变量选择异常值处理大数据

阅读原文 ↗