Continuous-Time Stochastic Policy Iteration of Adaptive Dynamic Programming
针对带随机扰动的连续时间非线性系统,提出一种新的随机自适应动态规划方法,通过条件期望同时逼近值函数和控制律,并证明了闭环系统的渐近稳定性与算法收敛性。
In this article, we study the optimal control problem of continuous-time (CT) time-invariant nonlinear systems with stochastic nonlinear disturbances. A new stochastic adaptive dynamic programming (ADP) method is developed to solve the Hamilton–Jacobi–Bellman equation (HJBE). Under the conditional expectation, the value function and the control law are successively approximated simultaneously. The asymptotic stability of the closed-loop stochastic system in probability is analyzed by the stochastic Lyapunov direct method, and the convergence of the developed ADP method is given. Finally, four simulations illustrate the effectiveness of the developed method.