🌙

随机动态多式联运中生态标签的原对偶值函数近似

Primal-Dual Value Function Approximation for Stochastic Dynamic Intermodal Transportation with Eco-Labels

Transportation Science · 2022
被引 7
ABS 3

中文导读

研究了在动态多式联运网络中,如何通过强化学习的值函数近似方法,在满足生态标签约束的同时最小化成本,并提出了原对偶改进策略以处理多目标和延迟反馈问题。

Abstract

Eco-labels are a way to benchmark transportation shipments with respect to their environmental impact. In contrast to an eco-labeling of consumer products, emissions in transportation depend on several operational factors like the mode of transportation (e.g., train or truck) or a vehicle’s current and potential future capacity utilization when new orders are added for consolidation. Thus, satisfying eco-labels and doing this cost efficiently is a challenging task when dynamically routing orders in an intermodal network. In this paper, we model the problem as a multiobjective sequential decision process and propose a reinforcement learning method: value function approximation (VFA). VFAs frequently simulate trajectories of the problem and store observed values (violated eco-labels and costs) for states aggregated to a set of features. The observations are used for improved decision making in the next trajectory. For our problem, we face two additional challenges when applying a VFA, the multiple objectives and the “delayed” realization of eco-label satisfaction due to future consolidation. For the first, we propose different feature sets dependent on the objective function’s focus: costs or eco-labels. For the latter, we propose enhancing the suboptimal decision making and observed pessimistic primal values within the VFA trajectories with optimistic dual decision making when all information of a trajectory is known ex post. This enhancement is a general methodological contribution to the literature of approximate dynamic programming and will likely improve learning for other problems as well. We show the advantages of both components in a comprehensive study for intermodal transport via trains and trucks in Europe.

多式联运强化学习动态规划生态标签运筹优化