No-Regret Bayesian Recommendation to Homogeneous Users
研究推荐平台在不知用户偏好时,如何利用产品状态信息在线设计推荐策略,逐步学习用户偏好并实现无Stackelberg遗憾,提出了对数遗憾和多项式遗憾的两种策略。
No-Regret Bayesian Recommendation to Homogeneous Users We introduce and study the online Bayesian recommendation problem for a recommender system platform. The platform has the privilege to privately observe a utility-relevant state of a product at each round and uses this information to make online recommendations to a stream of myopic users. This paradigm is common in a wide range of scenarios in the current internet economy. The platform commits to an online recommendation policy that utilizes its information advantage on the product state to persuade self-interested users to follow the recommendation. Because the platform does not know users’ preferences or beliefs in advance, we study the platform’s online learning problem of designing an adaptive recommendation policy to persuade users while gradually learning users’ preferences and beliefs en route. Specifically, we aim to design online learning policies with no Stackelberg regret for the platform, that is, against the optimal benchmark policy in hindsight under the assumption that users will correspondingly adapt their responses to the benchmark policy. Our first result is an online policy that achieves double logarithmic regret dependence on the number of rounds. We also present an information-theoretic lower bound showing that no adaptive online policy can achieve regret with better dependency on the number of rounds. Finally, by formulating the platform’s problem as optimizing a linear program with membership oracle access, we present our second online recommendation policy that achieves regret with polynomial dependence on the number of states but logarithmic dependence on the number of rounds.