人类指导暂停后自动驾驶的可持续强化学习

Sustainable Reinforcement Learning for Autonomous Driving Under Postsuspension of Human Guidance

IEEE Transactions on Systems, Man, and Cybernetics: Systems · 2025

被引 1

ABS 3

Lifei Dai
Changzhu Zhang
Hao Zhang
Yuxiong Ji
Huaicheng Yan

中文导读

提出一种可持续的人类引导强化学习框架，通过历史相似性补偿奖励和新学习范式，解决人类指导暂停后学习性能下降问题，在自动驾驶中比现有最优方法提升15%性能。

Abstract

This article introduces a sustainable human-guided reinforcement learning (RL) framework to address the challenge of learning performance degradation when the human guidance is suspended. First, a compensation reward based on the historical similarity between the RL agent and human guidance history is designed to ensure the continued influence of human guidance. To avoid cumulative errors in value function approximation caused by fitting the new reward, including the compensation reward, a novel RL paradigm is proposed, which bypasses value function fitting and directly optimizes the policy using historical similarity. This paradigm develops a new historical similarity-based learning objective for RL to leverage human guidance more efficiently and achieve alignment with human behavior. Furthermore, the proposed paradigm enables the fine-tuning of the RL agent to address the long-tail problem. Experimental results demonstrate the advantages of the proposed method in terms of sustainable guidance and optimal performance in the autonomous driving, achieving a 15% increase in optimal performance compared with existing state-of-the-art (SOTA) methods.

强化学习自动驾驶人机交互人工智能

阅读原文 ↗