带学习的平均场控制动态规划原理

Dynamic Programming Principles for Mean-Field Controls with Learning

Operations Research · 2023

被引 11

人大 AFT50UTD24ABS 4*

Xiaoli Wei · 清华大学
Haotian Gu · 加州大学伯克利分校
Xin Guo · 加州大学伯克利分校
Renyuan Xu · 南加州大学

中文导读

将带平均场近似和学习的多智能体系统重新表述为强化学习问题，用概率分布替代状态变量，为开发高效的价值和策略算法奠定基础。

Abstract

Multiagent systems—such as recommendation systems, ride-sharing platforms, food-delivery systems, and data-routing centers—are areas of rapid technology development that require constant improvements to address the lack of efficiency and curse of dimensionality. In the paper “Dynamic Programming Principles for Mean-Field Controls with Learning,” we show that multiagent systems with mean-field approximation and learning can be recast as general forms of reinforcement learning problems, where the state variable is replaced by the probability distribution. This reformulation paves the way for developing efficient value-based and policy-based algorithms for mean-field controls with learning. It is also the first step toward future theoretical development of learning problem with mean-field controls.

多智能体系统强化学习平均场理论动态规划机器学习

阅读原文 ↗