Sustainable Reinforcement Learning for Autonomous Driving Under Postsuspension of Human Guidance
提出一种可持续的人类引导强化学习框架,通过历史相似性补偿奖励和新学习范式,解决人类指导暂停后学习性能下降问题,在自动驾驶中比现有最优方法提升15%性能。
This article introduces a sustainable human-guided reinforcement learning (RL) framework to address the challenge of learning performance degradation when the human guidance is suspended. First, a compensation reward based on the historical similarity between the RL agent and human guidance history is designed to ensure the continued influence of human guidance. To avoid cumulative errors in value function approximation caused by fitting the new reward, including the compensation reward, a novel RL paradigm is proposed, which bypasses value function fitting and directly optimizes the policy using historical similarity. This paradigm develops a new historical similarity-based learning objective for RL to leverage human guidance more efficiently and achieve alignment with human behavior. Furthermore, the proposed paradigm enables the fine-tuning of the RL agent to address the long-tail problem. Experimental results demonstrate the advantages of the proposed method in terms of sustainable guidance and optimal performance in the autonomous driving, achieving a 15% increase in optimal performance compared with existing state-of-the-art (SOTA) methods.