Dynamic Programming Principles for Mean-Field Controls with Learning
将带平均场近似和学习的多智能体系统重新表述为强化学习问题,用概率分布替代状态变量,为开发高效的价值和策略算法奠定基础。
Multiagent systems—such as recommendation systems, ride-sharing platforms, food-delivery systems, and data-routing centers—are areas of rapid technology development that require constant improvements to address the lack of efficiency and curse of dimensionality. In the paper “Dynamic Programming Principles for Mean-Field Controls with Learning,” we show that multiagent systems with mean-field approximation and learning can be recast as general forms of reinforcement learning problems, where the state variable is replaced by the probability distribution. This reformulation paves the way for developing efficient value-based and policy-based algorithms for mean-field controls with learning. It is also the first step toward future theoretical development of learning problem with mean-field controls.