多目标强化学习中的后见之明偏差修正

Addressing Hindsight Bias in Multigoal Reinforcement Learning

IEEE Transactions on Cybernetics · 2021

被引 18

ABS 3

Chenjia Bai
Rui Zhao
Chenyao Bai
Peng Liu
Lingxiao Wang
Zhaoran Wang

中文导读

分析了多目标强化学习中后见经验回放（HER）算法因使用后见目标导致的学习偏差，提出偏差修正算法BHER，在机器人任务中优于现有方法。

Abstract

Multigoal reinforcement learning (RL) extends the typical RL with goal-conditional value functions and policies. One efficient multigoal RL algorithm is the hindsight experience replay (HER). By treating a hindsight goal from failed experiences as the original goal, HER enables the agent to receive rewards frequently. However, a key assumption of HER is that the hindsight goals do not change the likelihood of the sampled transitions and trajectories used in training, which is not the fact according to our analysis. More specifically, we show that using hindsight goals changes such a likelihood and results in a biased learning objective for multigoal RL. We analyze the hindsight bias due to this use of hindsight goals and propose the bias-corrected HER (BHER), an efficient algorithm that corrects the hindsight bias in training. We further show that BHER outperforms several state-of-the-art multigoal RL approaches in challenging robotics tasks.

强化学习机器学习人工智能机器人学

阅读原文 ↗