具有值函数学习的马尔可夫系统风险规避控制

Risk-averse control of Markov systems with value function learning

Annals of Operations Research · 2025

被引 0

ABS 3

Andrzej Ruszczyński 通讯
Shangzhe Yang

中文导读

研究了有限状态马尔可夫系统的风险规避控制问题，使用特征函数近似状态风险，提出基于小批量转移风险映射的鲁棒学习算法，并在水下机器人导航问题中验证了方法。

Abstract

Abstract We consider a control problem for a finite-state Markov system whose performance is evaluated by a coherent Markov risk measure. For each policy, the risk of a state is approximated by a function of its features, thus leading to a lower-dimensional policy evaluation problem, which involves non-differentiable stochastic operators. We introduce mini-batch transition risk mappings, which are particularly suited to our approach, and we use them to derive a robust learning algorithm for Markov policy evaluation. Finally, we discuss structured policy improvement in the feature-based risk-averse setting. The considerations are illustrated with an underwater robot navigation problem in which several waypoints must be visited and the observation results must be reported from selected transmission locations. We identify the relevant features, we test the simulation-based learning method, and we optimize a structured policy in a hyperspace containing all problems with the same number of relevant points.

马尔可夫决策过程风险度量强化学习控制理论机器学习

阅读原文 ↗