基于离策略风险敏感强化学习的约束鲁棒最优控制

Off-Policy Risk-Sensitive Reinforcement Learning-Based Constrained Robust Optimal Control

IEEE Transactions on Systems, Man, and Cybernetics: Systems · 2022
被引 14
ABS 3

中文导读

提出一种离策略风险敏感强化学习框架,在干扰环境下同时优化任务性能并满足输入和状态约束,通过风险感知价值函数将约束鲁棒镇定问题转化为无约束最优控制问题,并保证权重收敛与系统稳定性。

Abstract

This article proposes an off-policy risk-sensitive reinforcement learning (RL)-based control framework to jointly optimize the task performance and constraint satisfaction in a disturbed environment. The risk-aware value function, constructed using the pseudo control and risk-sensitive input and state penalty terms, is introduced to convert the original constrained robust stabilization problem into an equivalent unconstrained optimal control problem. Then, an off-policy RL algorithm is developed to learn the approximate solution to the risk-aware value function. During the learning process, the associated approximate optimal control policy is able to satisfy both input and state constraints under disturbances. By replaying experience data to the off-policy weight update law of the critic neural network, the weight convergence is guaranteed. Moreover, online and offline algorithms are developed to serve as principled ways to record informative experience data to achieve a sufficient excitation required for the weight convergence. The proofs of system stability and weight convergence are provided. The Simulation results reveal the validity of the proposed control framework.

强化学习最优控制鲁棒控制约束优化风险敏感控制