🌙

比较带Stieltjes矩阵的稀疏二次最小化的解路径

Comparing solution paths of sparse quadratic minimization with a Stieltjes matrix

Mathematical Programming · 2023
被引 6
ABS 4

中文导读

研究了稀疏二次最小化问题中三种解路径(ℓ0、ℓ1和截断ℓ1)的性质与权衡,发现截断ℓ1路径在计算效率和估计质量上优于其他两者,对优化和统计学习研究者有参考价值。

Abstract

Abstract This paper studies several solution paths of sparse quadratic minimization problems as a function of the weighing parameter of the bi-objective of estimation loss versus solution sparsity. Three such paths are considered: the “ $$\ell _0$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:msub><mml:mi>ℓ</mml:mi><mml:mn>0</mml:mn></mml:msub></mml:math> -path” where the discontinuous $$\ell _0$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:msub><mml:mi>ℓ</mml:mi><mml:mn>0</mml:mn></mml:msub></mml:math> -function provides the exact sparsity count; the “ $$\ell _1$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:msub><mml:mi>ℓ</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:math> -path” where the $$\ell _1$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:msub><mml:mi>ℓ</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:math> -function provides a convex surrogate of sparsity count; and the “capped $$\ell _1$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:msub><mml:mi>ℓ</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:math> -path” where the nonconvex nondifferentiable capped $$\ell _1$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:msub><mml:mi>ℓ</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:math> -function aims to enhance the $$\ell _1$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:msub><mml:mi>ℓ</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:math> -approximation. Serving different purposes, each of these three formulations is different from each other, both analytically and computationally. Our results deepen the understanding of (old and new) properties of the associated paths, highlight the pros, cons, and tradeoffs of these sparse optimization models, and provide numerical evidence to support the practical superiority of the capped $$\ell _1$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:msub><mml:mi>ℓ</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:math> -path. Our study of the capped $$\ell _1$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:msub><mml:mi>ℓ</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:math> -path is interesting in its own right as the path pertains to computable directionally stationary (= strongly locally minimizing in this context, as opposed to globally optimal) solutions of a parametric nonconvex nondifferentiable optimization problem. Motivated by classical parametric quadratic programming theory and reinforced by modern statistical learning studies, both casting an exponential perspective in fully describing such solution paths, we also aim to address the question of whether some of them can be fully traced in strongly polynomial time in the problem dimensions. A major conclusion of this paper is that a path of directional stationary solutions of the capped $$\ell _1$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:msub><mml:mi>ℓ</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:math> -regularized problem offers interesting theoretical properties and practical compromise between the $$\ell _0$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:msub><mml:mi>ℓ</mml:mi><mml:mn>0</mml:mn></mml:msub></mml:math> -path and the $$\ell _1$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:msub><mml:mi>ℓ</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:math> -path. Indeed, while the $$\ell _0$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:msub><mml:mi>ℓ</mml:mi><mml:mn>0</mml:mn></mml:msub></mml:math> -path is computationally prohibitive and greatly handicapped by the repeated solution of mixed-integer nonlinear programs, the quality of $$\ell _1$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:msub><mml:mi>ℓ</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:math> -path, in terms of the two criteria—loss and sparsity—in the estimation objective, is inferior to the capped $$\ell _1$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:msub><mml:mi>ℓ</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:math> -path; the latter can be obtained efficiently by a combination of a parametric pivoting-like scheme supplemented by an algorithm that takes advantage of the Z-matrix structure of the loss function.

稀疏优化二次规划统计学习算法