连续设定下Sinkhorn算法更快的指数收敛速度

Sharper exponential convergence rates for Sinkhorn’s algorithm in continuous settings

Mathematical Programming · 2025

被引 3 · 同刊同年前 8%

ABS 4

Lénaïc Chizat
Alex Delalande 通讯
Tomas Vaškevičius

中文导读

研究了当至少一个概率测度有密度时，Sinkhorn算法求解熵正则化最优传输问题的收敛速度，得到了比希尔伯特投影度量更快的指数收敛率，且结果非渐近、显式。

Abstract

Abstract We study the convergence rate of Sinkhorn’s algorithm for solving entropy-regularized optimal transport problems when at least one of the probability measures, $$\mu $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mi>μ</mml:mi> </mml:math> , admits a density over $$\mathbb {R}^d$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:msup> <mml:mrow> <mml:mi>R</mml:mi> </mml:mrow> <mml:mi>d</mml:mi> </mml:msup> </mml:math> . For a semi-concave cost function bounded by $$c_{\infty }$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:msub> <mml:mi>c</mml:mi> <mml:mi>∞</mml:mi> </mml:msub> </mml:math> and a regularization parameter $$\lambda > 0$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>λ</mml:mi> <mml:mo>></mml:mo> <mml:mn>0</mml:mn> </mml:mrow> </mml:math> , we obtain exponential convergence guarantees on the dual sub-optimality gap with contraction rates that are polynomial in $$\lambda /c_{\infty }$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>λ</mml:mi> <mml:mo>/</mml:mo> <mml:msub> <mml:mi>c</mml:mi> <mml:mi>∞</mml:mi> </mml:msub> </mml:mrow> </mml:math> . This represents an exponential improvement over the known contraction rate $$1 - \Theta (\exp (-c_{\infty }/\lambda ))$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mn>1</mml:mn> <mml:mo>-</mml:mo> <mml:mi>Θ</mml:mi> <mml:mo>(</mml:mo> <mml:mo>exp</mml:mo> <mml:mrow> <mml:mo>(</mml:mo> <mml:mo>-</mml:mo> <mml:msub> <mml:mi>c</mml:mi> <mml:mi>∞</mml:mi> </mml:msub> <mml:mo>/</mml:mo> <mml:mi>λ</mml:mi> <mml:mo>)</mml:mo> </mml:mrow> <mml:mo>)</mml:mo> </mml:mrow> </mml:math> achievable via Hilbert’s projective metric. Specifically, we prove a contraction rate value of $$1-\Theta (\lambda ^2/c_\infty ^2)$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mn>1</mml:mn> <mml:mo>-</mml:mo> <mml:mi>Θ</mml:mi> <mml:mo>(</mml:mo> <mml:msup> <mml:mi>λ</mml:mi> <mml:mn>2</mml:mn> </mml:msup> <mml:mo>/</mml:mo> <mml:msubsup> <mml:mi>c</mml:mi> <mml:mi>∞</mml:mi> <mml:mn>2</mml:mn> </mml:msubsup> <mml:mo>)</mml:mo> </mml:mrow> </mml:math> when $$\mu $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mi>μ</mml:mi> </mml:math> has a bounded log-density. In some cases, such as when $$\mu $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mi>μ</mml:mi> </mml:math> is log-concave and the cost function is $$c(x,y)=-\langle x, y\rangle $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>c</mml:mi> <mml:mo>(</mml:mo> <mml:mi>x</mml:mi> <mml:mo>,</mml:mo> <mml:mi>y</mml:mi> <mml:mo>)</mml:mo> <mml:mo>=</mml:mo> <mml:mo>-</mml:mo> <mml:mo>⟨</mml:mo> <mml:mi>x</mml:mi> <mml:mo>,</mml:mo> <mml:mi>y</mml:mi> <mml:mo>⟩</mml:mo> </mml:mrow> </mml:math> , this rate improves to $$1-\Theta (\lambda /c_\infty )$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mn>1</mml:mn> <mml:mo>-</mml:mo> <mml:mi>Θ</mml:mi> <mml:mo>(</mml:mo> <mml:mi>λ</mml:mi> <mml:mo>/</mml:mo> <mml:msub> <mml:mi>c</mml:mi> <mml:mi>∞</mml:mi> </mml:msub> <mml:mo>)</mml:mo> </mml:mrow> </mml:math> . The latter rate matches the one that we derive for the transport between isotropic Gaussian measures, indicating tightness in the dependency in $$\lambda /c_\infty $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>λ</mml:mi> <mml:mo>/</mml:mo> <mml:msub> <mml:mi>c</mml:mi> <mml:mi>∞</mml:mi> </mml:msub> </mml:mrow> </mml:math> . Our results are fully non-asymptotic and explicit in all the parameters of the problem.

最优传输熵正则化Sinkhorn算法收敛速度

阅读原文 ↗