基于神经网络的对未知非线性系统最优控制的在线离策略强化学习方法

Online Off-Policy Reinforcement Learning for Optimal Control of Unknown Nonlinear Systems Using Neural Networks

IEEE Transactions on Systems, Man, and Cybernetics: Systems · 2024

被引 10

ABS 3

Liao Zhu
Qinglai Wei
Ping Guo

中文导读

提出一种在线离策略强化学习方法，利用神经网络实时学习未知非线性系统的最优控制策略，无需系统模型，仅需状态数据，并证明权重误差指数收敛。

Abstract

In this article, a real-time online off-policy reinforcement learning (RL) method is developed for the optimal control problem of unknown continuous-time nonlinear systems. First, by applying the temporal difference technique to the iterative procedure of off-policy RL, the iterative value function and the iterative policy input can be learned in real-time online. It is proven that the fitting error of neural network (NN) weights is exponentially convergent in each iteration. Second, a model-free Hamilton–Jacobi–Bellman equation (MF-HJBE) is deduced by taking the limit of the iterative procedure of off-policy RL. In this manner, it not only eliminates system dynamics in the classical HJBE, but also vanishes the iteration index. By applying temporal difference to the MF-HJBE, a real-time online tuning rule is designed to learn the optimal value function and the optimal policy input. It is proven that the fitting error of NN weights caused by the real-time online tuning rule is exponentially convergent. Note that the two online tuning rules, the iterative one and the real-time one, use only current and previous state data extracted from system trajectories. Meanwhile, it is proven using the Lyapunov’s direct method that the system solution is uniformly ultimately bounded. Finally, simulation results demonstrate the validity of the proffered method.

强化学习非线性系统最优控制神经网络自适应控制

阅读原文 ↗