Accelerated Value Iteration-Based Safe Q-Learning for Data-Driven Optimal Tracking Control
提出一种加速值迭代的安全Q学习算法,通过引入控制障碍函数保证跟踪控制的安全性和最优性,并利用Nesterov动量加速学习,适用于未知非线性系统的数据驱动跟踪控制。
In this article, an accelerated value iteration-based safe Q-learning (SQL) algorithm is developed to design the tracking controller for unknown nonlinear systems. First, an augmented Q-function, consisting of a quadratic utility function and an adjustable positive-definite control barrier function (CBF), is devised to ensure both the optimality and safety of the tracking controller. The quadratic utility function, associated with optimality, guarantees that the tracking controller can eliminate the ultimate tracking error, regardless of the reference trajectory. The adjustable positive-definite CBF, pertaining to safety, ensures that the tracking error converges faster toward zero while remaining within the safe set at all times. Second, an accelerated iterative learning mechanism, comprising policy evaluation (PE) and policy improvement (PI), is employed to discover the safe optimal tracking control policy. Integrating the difference between two iterative Q-functions into the current PE process can expedite the convergence rate of the SQL algorithm. A policy optimization technique based on Nesterov Momentum method is utilized to accelerate the PI process of the SQL algorithm. When faced with a large amount of offline data, the two-stage accelerated learning effectively reduces computational pressure. Furthermore, convergence of the Q-function sequence and safety of the optimal tracking policy are theoretically analyzed. Finally, by using neural networks and the action-critic structure, two simulation examples are performed to verify the availability of accelerated SQL methods.