🌙

基于学习的控制策略与在线二次优化中非对称信息结构的遗憾分析

Learning-Based Control Policy and Regret Analysis for Online Quadratic Optimization With Asymmetric Information Structure

IEEE Transactions on Cybernetics · 2021
被引 9
ABS 3

中文导读

针对非对称信息结构下的动态系统,提出一种在线学习控制策略,利用遗憾分析衡量性能损失,证明遗憾为次线性且受O(lnT)界。

Abstract

In this article, we propose a learning approach to analyze dynamic systems with an asymmetric information structure. Instead of adopting a game-theoretic setting, we investigate an online quadratic optimization problem driven by system noises with unknown statistics. Due to information asymmetry, it is infeasible to use the classic Kalman filter nor optimal control strategies for such systems. It is necessary and beneficial to develop an admissible approach that learns the probability statistics as time goes forward. Motivated by the online convex optimization (OCO) theory, we introduce the notion of regret, which is defined as the cumulative performance loss difference between the optimal offline-known statistics cost and the optimal online-unknown statistics cost. By utilizing dynamic programming and linear minimum mean square biased estimate (LMMSUE), we propose a new type of online state-feedback control policy and characterize the behavior of regret in a finite-time regime. The regret is shown to be sublinear and bounded by O(lnT) . Moreover, we address an online optimization problem with output-feedback control policy and propose a heuristic online control policy.

在线优化控制理论机器学习动态系统