Optimal No-Regret Learning in Repeated First-Price Auctions
研究投标人在重复第一价格拍卖中,仅观察获胜出价的情况下,如何自适应出价以最大化累计收益,并提出了首个达到接近最优遗憾界的学习算法。
In “Optimal No-Regret Learning in Repeated First-Price Auctions,” Y. Han, W. Tsachy, and Z. Zhou study online learning in repeated first-price auctions where a bidder, only observing the winning bid at the end of each auction, learns to adaptively bid to maximize her cumulative payoff. To achieve this goal, the bidder faces censored feedback: If she wins the bid, then she is not able to observe the highest bid of the other bidders, which we assume is i.i.d. drawn from an unknown distribution. In this paper, they develop the first learning algorithm that achieves a near-optimal regret bound, by exploiting two structural properties of first-price auctions, that is, the specific feedback structure and payoff function.