在线最短路径问题：使用多臂老虎机框架学习行驶时间

The Online Shortest Path Problem: Learning Travel Times Using a Multiarmed Bandit Framework

Transportation Science · 2024

被引 6

ABS 3

Tomás Lagos
Ramón Auad
Felipe Lagos

中文导读

针对物流公司缺乏准确行驶时间信息的问题，将在线最短路径建模为多臂老虎机问题，提出三种方法平衡探索与利用，在人工和北京出租车真实数据上验证了有效性。

Abstract

In the age of e-commerce, logistics companies often operate within extensive road networks without accurate knowledge of travel times for their specific fleet of vehicles. Moreover, millions of dollars are spent on routing services that fail to accurately capture the unique characteristics of the drivers and vehicles of the companies. In this work, we address the challenge faced by a logistic operator with limited travel time information, aiming to find the optimal expected shortest path between origin-destination pairs. We model this problem as an online shortest path problem, common to many last-mile routing settings; given a graph whose arcs’ travel times are stochastic and follow an unknown distribution, the objective is to find a vehicle route of minimum travel time from an origin to a destination. The planner progressively collects travel condition data as drivers complete their routes. Inspired by the combinatorial multiarmed bandit and kriging literature, we propose three methods with distinct features to effectively learn the optimal shortest path, highlighting the practical advantages of incorporating spatial correlation in the learning process. Our approach balances exploration (improving estimates for unexplored arcs) and exploitation (executing the minimum expected time path) using the Thompson sampling algorithm. In each iteration, our algorithm executes the path that minimizes the expected travel time based on data from a posterior distribution of the speeds of the arcs. We conduct a computational study comprising two settings: a set of four artificial instances and a real-life case study. The case study uses empirical data of taxis in the 17-km-radius area of the center of Beijing, encompassing Beijing’s “5th Ring Road.” In both settings, our algorithms demonstrate efficient and effective balancing of the exploration-exploitation trade-off.

物流与供应链运筹学机器学习路径规划

阅读原文 ↗