基于动态凸风险度量的强化学习

Reinforcement learning with dynamic convex risk measures

Mathematical Finance · 2023

被引 23

人大 BABS 3

Anthony Coache · 多伦多大学
Sebastian Jaimungal · 多伦多大学通讯

中文导读

提出一种使用动态凸风险度量进行风险敏感随机优化的无模型强化学习方法，通过策略梯度更新和演员-评论家算法求解最优策略，并应用于统计套利、金融对冲和机器人避障。

Abstract

Abstract We develop an approach for solving time‐consistent risk‐sensitive stochastic optimization problems using model‐free reinforcement learning (RL). Specifically, we assume agents assess the risk of a sequence of random variables using dynamic convex risk measures. We employ a time‐consistent dynamic programming principle to determine the value of a particular policy, and develop policy gradient update rules that aid in obtaining optimal policies. We further develop an actor–critic style algorithm using neural networks to optimize over policies. Finally, we demonstrate the performance and flexibility of our approach by applying it to three optimization problems: statistical arbitrage trading strategies, financial hedging, and obstacle avoidance robot control.

强化学习风险管理金融优化机器人控制

阅读原文 ↗