Integrated Fault Detection and Fault-Tolerant Optimal Tracking Control for Unknown Nonlinear Systems With State and Input Constraints Using Safe Reinforcement Learning
针对存在执行器和过程故障的未知非线性系统,提出一种基于安全强化学习的集成故障检测与容错最优跟踪控制方案,无需系统动力学先验知识,通过标识器-评论家框架在线求解哈密顿-雅可比-贝尔曼方程,并利用经验回放和可变遗忘因子提高收敛速度与鲁棒性。
This article deals with the problem of fault detection (FD) and fault-tolerant optimal control in unknown affine nonlinear systems subject to actuator and process faults with state and input constraints. In this regard, a novel integrated fault-tolerant optimal tracking control (FTOTC) scheme is proposed based on safe reinforcement learning (RL) without requiring prior knowledge of the system dynamics. First, by introducing a discounted cost function that incorporates a control barrier function (CBF) term, the constrained FTOTC problem is transformed into an unconstrained optimal regulation problem for an augmented system formed from the dynamics of the tracking error and the reference trajectory. Subsequently, we adopt an identifier-critic framework to solve the associated Hamilton–Jacobi-Bellman (HJB) equation online, training both neural networks (NNs) simultaneously. We propose an experience replay (ER)-enhanced update law for the identifier NN, where the forgetting factor (FF) is considered to be variable and dependent on the state estimation errors and measurement noise estimation. It is shown that the proposed identifier update law improves the convergence rate and reduces NN weight estimation error in comparison with the constant FF condition without the ER enhancement, while increasing robustness against measurement noise. Distinct from typical RL algorithms, the initial admissible control condition is eliminated by using a modified variable gain gradient descent-based update law for the critic NN, which leverages a stabilizing term. The learning rate of critic update law is a function of instantaneous HJB error, yielding a tighter a tighter residual error bound for the NN weights. We prove that all the system states and the identifier and critic NN weight errors are uniformly ultimately bounded (UUB) by the Lyapunov stability theory. Simulation results on a single-link robotic manipulator demonstrate the effectiveness of the presented integrated scheme in different fault scenarios.