面向多目标推荐系统的深度帕累托强化学习

Deep Pareto Reinforcement Learning for Multi-Objective Recommender Systems

MIS Quarterly · 2025

被引 0

人大 A+FT50UTD24ABS 4*

Pan Li · 亚特兰大技术学院通讯
Alexander Tuzhilin · 纽约大学

中文导读

提出深度帕累托强化学习方法，建模推荐目标间的动态关系，捕捉个性化偏好，同时优化短期和长期性能，在阿里视频平台显著提升三个冲突业务目标。

Abstract

Optimizing multiple objectives simultaneously is an important task for recommendation platforms to improve their performance. However, this task is particularly challenging since the relationships between different objectives are heterogeneous across different consumers and dynamically fluctuate according to different contexts, resulting in a Pareto-frontier in the result of recommendations, where the improvement of any objective comes at the cost of others. Existing multi-objective recommender systems do not systematically consider such dynamic relationships; instead, they balance between these objectives in a static and uniform manner, resulting in only suboptimal recommendation performance. In this paper, we propose a Deep Pareto Reinforcement Learning (DeepPRL) method, where we (1) comprehensively model the complex relationships between multiple recommendation objectives; (2) effectively capture personalized and contextual consumer preferences for each objective; (3) optimize both the short-term and the long-term recommendation performance. As a result, our method achieves significant Pareto-dominance over the state-of-the-art baselines across four offline experiments. Furthermore, we conducted a controlled experiment on Alibaba's video streaming platform, where our method simultaneously improved three conflicting business objectives significantly over the latest production system, demonstrating its tangible economic impact in practice.

推荐系统强化学习多目标优化帕累托前沿

阅读原文 ↗