An end-to-end direct reinforcement learning approach for multi-factor based mean-variance portfolio optimization
提出一种端到端的在线投资组合决策模型,将多因子模型与均值-方差优化直接整合进强化学习框架,通过梯度方法优化神经网络参数,在真实市场数据测试中优于多个基准组合。
This paper introduces an end-to-end online portfolio decision model within the framework of direct reinforcement learning, seamlessly integrating the multi-factor model and mean-variance (MV) portfolio optimization. Recognizing that the classical decision process, which separates estimation and portfolio optimization into a two-step scheme, may accumulate estimation errors jeopardizing overall performance, our approach unifies these steps into a performance-oriented online decision process. This integration is achieved by tuning the neural network parameters directly with respect to the reward function, designed as a combination of the prediction error and realized MV utility. Specifically, we employ a neural network to estimate future returns and generate the factor loading matrix, enabling the computation of inputs for the MV portfolio optimization model. The network parameters are optimized with respect to the updated reward using the gradient method. We develop an online updating scheme for computing the gradient in backpropagation by providing explicit formulas for MV portfolio derivatives through the portfolio optimization layer. Utilizing real market data, we evaluate the proposed method against several benchmark portfolios in out-of-sample tests. The experiments demonstrate that our approach not only outperforms these benchmarks across various performance metrics but is also transparent to factor analysis, a favorable trait for practitioners.