囚徒困境中的强化学习

Reinforcement learning in a prisoner's dilemma

Games and Economic Behavior · 2024

被引 21 · 同刊同年前 2%

人大 AABS 3

Arthur Dolgopolov · 比勒费尔德大学通讯

中文导读

研究了无状态Q学习等强化学习算法在囚徒困境中的极限行为，揭示了学习率与博弈收益如何决定玩家学会合作还是背叛，对算法共谋有启示。

Abstract

I characterize the outcomes of a class of model-free reinforcement learning algorithms, such as stateless Q-learning, in a prisoner's dilemma. The behavior is studied in the limit as players stop experimenting after sufficiently exploring their options. A closed form relationship between the learning rate and game payoffs reveals whether the players will learn to cooperate or defect. The findings have implications for algorithmic collusion and also apply to asymmetric learners with different experimentation rules.

强化学习囚徒困境算法合谋学习率

阅读原文 ↗