Multi-Attribute Utility Deep Reinforcement Learning method for Sequential Multi-Criteria Decision problems: Application to human resource planning
提出一种结合深度强化学习与多属性效用理论的新算法MAUDRL,用于解决序贯多准则决策问题,并在加拿大蓝莓农场的人力资源规划中验证其有效性,兼顾决策者的风险偏好与多个冲突目标。
Problem-solving and decision-making can be complex. There are often conflicting criteria, and decisions must take into account both immediate and long-term impacts, which define Sequential Multi-Criteria Decision (SMCD). Deep Reinforcement Learning (DRL) has emerged by integrating traditional Reinforcement Learning with Deep Learning to tackle intricate sequential decision-making problems. Although DRL has seen significant progress recently, there has been limited focus on developing DRL algorithms specifically for SMCD problems, which usually involve conflicting and non-commensurable attributes. To bridge this gap, we introduce a novel algorithm called Multi-Attribute Utility DRL (MAUDRL), which combines DRL with Multi-Criteria Decision Analysis (MCDA). This innovative approach provides a clear and transparent DRL model that can address the intricacies of SMCD problems while integrating the risk attitudes and preferences of the decision-maker. We showcase the potential of MAUDRL in promoting sustainable decision-making for human resource planning for blueberry farming in British Columbia, Canada. We evaluate the performance of MAUDRL in comparison with two benchmark algorithms—Oracle Discrete Multi-Attribute Utility Theory (MAUT) and the Single Reward Aggregation Approach—using three metrics: policy quality, goal achievement, and run times. The numerical analysis and benchmarks validate that MAUDRL offers practical solutions for SMCD problems by assisting in exploring diverse solution spaces efficiently. The theoretical implications and practical applications of these results are discussed, underscoring the capability of MAUDRL in tackling complex SMCD problem domains and advancing sustainable and socially responsible decision-making while considering the risk preferences of decision-makers. • Presents a MAUDRL method for sequential multi-criteria decision problems. • Integrates deep reinforcement learning with multi-attribute utility theory. • Considers decision-makers’ risk preferences and multiple conflicting criteria. • Reduces runtime by training separate DQNs and aggregating learned utilities. • Demonstrates MAUDRL in sustainable human resource planning for agriculture.