具有随机收益的零和折扣随机博弈

Zero-sum discounted stochastic games with random rewards

Annals of Operations Research · 2026

被引 0 · 同刊同年前 10%

ABS 3

Lucas Osmani
Abdel Lisser 通讯
Vikas Vikram Singh

中文导读

研究了两玩家零和折扣随机博弈中随机收益的优化问题，通过机会约束建模风险态度，提出了风险寻求和风险厌恶情形下的求解算法，并用数值实验验证。

Abstract

Abstract We consider a two-person zero-sum discounted stochastic game with random rewards and known transition probabilities. The players have opposite objectives and are interested in optimizing the expected discounted reward which they can obtain with a given confidence level when both the players play the worst possible move against each other. We model such a game problem by defining the chance-constrained optimization problem for each player. In this framework, risk attitudes are determined by confidence levels, where values of 0.5 or below correspond to risk-seeking behavior and values above 0.5 correspond to risk-averse behavior. When the reward vector follows a multivariate elliptically symmetric distribution, the game is equivalent to a minimax formulation. We consider the game with risk-seeking and risk-averse players separately. We show that the risk-seeking problem is equivalent to a constrained optimization of a parameterized zero-sum stochastic game and the optimal payoff of player 1 and optimal cost of player 2 can be computed using Riemann gradient sampling algorithms. Later we use the solution of the constrained optimization problem of each player to compute its optimal strategy by solving a linear programming problem. We reformulate the risk-averse problem as a discrete minimax problem. We propose an algorithm based on a linearization method and discuss its convergence properties. Alternatively, we reformulate the risk-averse problem as a second-order cone programming problem with bilinear constraints. The numerical experiments on randomly generated instances are performed to illustrate our theoretical results.

随机博弈零和博弈风险态度优化算法线性规划

阅读原文 ↗