Exploratory mean-variance portfolio selection with Choquet regularizers
研究了连续时间下使用Choquet正则化器衡量探索程度的均值-方差投资组合问题,推导出最优分布并开发了强化学习算法,通过模拟和实证验证了其性能。
In this paper, we study a continuous-time exploratory mean-variance (EMV) problem under the framework of reinforcement learning (RL), and the Choquet regularizers are used to measure the level of exploration. By applying the classical Bellman principle of optimality, the Hamilton-Jacobi-Bellman equation of the EMV problem is derived and solved explicitly via maximizing statically a mean-variance constrained Choquet regularizer. In particular, the optimal distributions form a location-scale family, whose shape depends on the choices of the Choquet regularizer. We further reformulate the continuous-time Choquet-regularized EMV problem using a variant of the Choquet regularizer. Several examples are given under specific Choquet regularizers that generate broadly used exploratory samplers such as exponential, uniform and Gaussian. Finally, we develop a reinforcement learning algorithm and assess its performance via simulations and empirical analysis, including comparisons with the plug-in policy and the entropy-regularized policy.