高维状态空间下马尔可夫决策过程的结构估计：具有有限时间保证的方法

Structural Estimation of Markov Decision Processes in High-Dimensional State Space with Finite-Time Guarantees

Operations Research · 2024

被引 3

人大 AFT50UTD24ABS 4*

Siliang Zeng · 明尼苏达大学
Mingyi Hong · 明尼苏达大学
Alfredo García · 得克萨斯农工大学

中文导读

提出一种新算法，结合策略改进与随机梯度步骤，高效估计高维状态空间下人类动态决策的结构模型，具有有限时间收敛保证，适用于复杂决策建模。

Abstract

Researchers have introduced a new algorithm to estimate structural models of dynamic decisions by human agents, addressing the challenge of high computational complexity. Traditionally, this task involves a nested structure: an inner problem identifying an optimal policy and an outer problem maximizing a measure of fit. Previous methods have struggled with large discrete state spaces or high-dimensional continuous state spaces, often sacrificing reward estimation accuracy. The new approach combines policy improvement with a stochastic gradient step for likelihood maximization, ensuring accurate reward estimation without compromising computational efficiency. This single-loop algorithm, designed to handle high-dimensional state spaces, converges to a stationary solution with finite-time guarantees. When the reward is linearly parameterized, it approximates the maximum likelihood estimator sublinearly, offering a robust solution for complex decision modeling tasks.

马尔可夫决策过程结构估计高维状态空间机器学习经济学

阅读原文 ↗