TeViR：利用扩散模型实现文本到视频奖励的高效强化学习

TeViR: Text-to-Video Reward With Diffusion Models for Efficient Reinforcement Learning

IEEE Transactions on Systems, Man, and Cybernetics: Systems · 2025

被引 0

ABS 3

Yuhui Chen
Haoran Li
Ziyuan Jiang
Haowei Wen
Dongbin Zhao

中文导读

提出TeViR方法，利用预训练的文本到视频扩散模型生成密集奖励，通过比较预测图像序列与当前观测来提升强化学习样本效率，在13个仿真和真实机器人任务中优于传统方法。

Abstract

Developing scalable and generalizable reward engineering for reinforcement learning (RL) is crucial for creating general-purpose agents, especially in the challenging domain of robotic manipulation. While recent advances in reward engineering with vision–language models (VLMs) have shown promise, their sparse reward nature significantly limits sample efficiency. This article introduces text-to-video reward (TeViR), a novel method that leverages a pretrained text-to-video diffusion model to generate dense rewards by comparing the predicted image sequence with current observations. Experimental results across 13 simulation and real-world robotic tasks demonstrate that TeViR outperforms traditional methods leveraging sparse rewards and other state-of-the-art (SOTA) methods, achieving better sample efficiency and performance without ground truth environmental rewards. TeViR’s ability to efficiently guide agents in complex environments highlights its potential to advance RL applications in robotic manipulation.

强化学习机器人操作奖励工程扩散模型视觉语言模型

阅读原文 ↗