Ordinal regression meets online learning: Interactive preference learning for multiple criteria choice and ranking with provable guarantees
将序数回归重新表述为序列预测任务,研究一类为决策者偏好序列分配概率的算法,提供理论遗憾保证,并在线性模型和Bradley-Terry模型下建立贝叶斯方法和正则化最大似然算法的理论界限,实验表明优于现有方法。
We propose a theoretical and practical bridge between ordinal regression for multiple criteria choice and ranking problems and the framework of sequential prediction, also known as online learning. By reframing the ordinal regression as a sequential prediction task, we study a general class of algorithms that assign probabilities to a sequence of the preferences expressed by the Decision Maker (DM). This approach allows us to evaluate various statistical algorithms on a common basis, providing theoretical guarantees on their regret. To model the likelihood, we employ an additive value function that scores pairwise comparisons given by the DM. We explore two likelihood models: (1) a linear model, which we demonstrate is analogous to sequential investment, and (2) the Bradley–Terry model, widely used in statistics and preference learning. For both models, we establish theoretical bounds for the Bayesian method and the Regularized Maximum Likelihood algorithm (also known as Follow the Regularized Leader). We design Monte Carlo Markov Chain methods based on Metropolis–Hastings and Nested Sampling for efficient approximation of the posterior in Bayesian methods. Extensive empirical testing on synthetic and real-world data shows that our methods outperform the best existing approaches in the literature.