通过离线贪心算法实现在线学习:在市场设计与优化中的应用

Online Learning via Offline Greedy Algorithms: Applications in Market Design and Optimization

Management Science · 2022
被引 10
人大 A+FT50UTD24ABS 4*

中文导读

提出一个通用框架,将离线贪心算法转化为在线算法,应用于产品排名优化、拍卖保留价优化和子模最大化等问题,在完全信息和赌博机设置下均取得理论保证,数值实验表现优于理论界。

Abstract

Motivated by online decision making in time-varying combinatorial environments, we study the problem of transforming offline algorithms to their online counterparts. We focus on offline combinatorial problems that are amenable to a constant factor approximation using a greedy algorithm that is robust to local errors. For such problems, we provide a general framework that efficiently transforms offline robust greedy algorithms to online ones using Blackwell approachability. We show that the resulting online algorithms have [Formula: see text] (approximate) regret under the full information setting. We further introduce a bandit extension of Blackwell approachability that we call Bandit Blackwell approachability. We leverage this notion to transform greedy robust offline algorithms into a [Formula: see text] (approximate) regret in the bandit setting. Demonstrating the flexibility of our framework, we apply our offline-to-online transformation to several problems at the intersection of revenue management, market design, and online optimization, including product ranking optimization in online platforms, reserve price optimization in auctions, and submodular maximization. We also extend our reduction to greedy-like first-order methods used in continuous optimization, such as those used for maximizing continuous strong DR monotone submodular functions subject to convex constraints. We show that our transformation, when applied to these applications, leads to new regret bounds or improves the current known bounds. We complement our theoretical studies by conducting numerical simulations for two of our applications, in both of which we observe that the numerical performance of our transformations outperforms the theoretical guarantees in practical instances. This paper was accepted by George Shanthikumar, data science. Funding: R. Niazadeh was supported by research funding from University of Chicago Booth School of Business. N. Golrezaei was supported in part by the Young Investigator Program Award from the Office of Naval Research [N00014-21-1-2776] and the MIT Research Support Award. Supplemental Material: The Online Appendix is available at https://doi.org/10.1287/mnsc.2022.4558

在线学习离线贪婪算法Blackwell可接近性遗憾界