Reinforcement-Learning-Based Tracking Control of Waste Water Treatment Process Under Realistic System Conditions and Control Performance Requirements
针对污水处理过程高度非线性、强耦合、难建模且受未知干扰的问题,提出基于直接启发式动态规划的强化学习控制方法,通过在线学习实现溶解氧和硝酸盐浓度的跟踪控制,并在BSM1平台上验证效果。
The tracking control of a wastewater treatment process (WWTP) is considered. The process is highly nonlinear, with strong coupling, difficult to model mathematically, and the operation is subject to unknown disturbances. We address this multivariable tracking control problem by applying the direct heuristic dynamic programming (dHDP)-based reinforcement learning control. The control goal is to track a desired reference of the dissolved oxygen (DO) concentration of the 5th aerobic zone ( <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$S_{O5}$ </tex-math></inline-formula> ) and nitrate concentration of the 2nd anoxic zone ( <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$S_{NO2}$ </tex-math></inline-formula> ) by manipulating the oxygen transfer coefficient of the 5th aerobic zone ( <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$K_{L}a_{5}$ </tex-math></inline-formula> ) and internal recycle flow rate ( <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$Q_{a}$ </tex-math></inline-formula> ). The dHDP aims at achieving a minimal accumulated WWTP tracking error while dealing with strong coupling between the <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$S_{O5}$ </tex-math></inline-formula> and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$S_{NO2}$ </tex-math></inline-formula> and eliminating unknown disturbances in the process. The proposed dHDP approach devises an optimal control strategy entirely driven by WWTP process data as an online learning control method. We have conducted extensive and systematic simulations based on the well-known BSM1 platform of the WWTP controlled by dHDP to compare and contrast performances with other methods.