Outlier Detection in Regression: Conic Quadratic Formulations
针对线性回归中异常值检测问题,提出不使用大M约束的更强二阶锥松弛,计算实验表明比现有方法快几个数量级,且精确求解能获得比启发式更好的解。
In many applications, when building linear regression models, it is important to account for the presence of outliers, that is, corrupted input data points. Such problems can be formulated as mixed-integer optimization problems involving cubic terms, each given by the product of a binary variable and a quadratic term of the continuous variables. Existing approaches in the literature, typically relying on the linearization of the cubic terms using big-M constraints, suffer from weak relaxation and poor performance in practice. In this work we derive stronger second-order conic relaxations that do not involve big-M constraints. Our computational experiments indicate that the proposed formulations are several orders of magnitude faster than existing big-M formulations in the literature for this problem. In addition, we verify that exact solution of the mixed-integer optimization problems can lead to substantially better-quality solutions than simpler relaxations or heuristics commonly used in practice. History: Accepted by Andrea Lodi, Area Editor for Design & Analysis of Algorithms—Discrete. Funding: This work was supported by a public grant as part of the Investissement d’avenir project [Grant ANR-11-LABX-0056-LMH, LabEx LMH]. A. Gómez is supported by the US Air Force Office of Scientific Research [Grant FA9550-22-1-0369]. Supplemental Material: The software that supports the findings of this study is available within the paper and its Supplemental Information ( https://pubsonline.informs.org/doi/suppl/10.1287/ijoc.2025.1215 ) as well as from the IJOC GitHub software repository ( https://github.com/INFORMSJoC/2025.1215 ). The complete IJOC Software and Data Repository is available at https://informsjoc.github.io/ .