受控马尔可夫链的离线估计：极小极大性与样本复杂度

Off-line Estimation of Controlled Markov Chains: Minimaxity and Sample Complexity

Operations Research · 2025

被引 0

人大 AFT50UTD24ABS 4*

Imon Banerjee · 西北大学（美国）
Harsha Honnappa · 普渡大学
Vinayak Rao · 普渡大学

中文导读

研究提出一种非参数估计器用于受控马尔可夫链的转移概率，在非平稳非马尔可夫环境下仍具鲁棒性，并给出了精确的样本复杂度界限，证明了常用估计量的极小极大最优性。

Abstract

New Insights into Off-line Estimation for Controlled Markov Chains Unveiled A team of researchers from Purdue and Northwestern Universities have unveiled new findings in off-line estimation for controlled Markov chains, addressing challenges in analyzing complex data generated under arbitrary dynamics. The study introduces a nonparametric estimator for transition probabilities, showcasing its robustness even in nonstationary, non-Markovian environments. The team developed precise sample complexity bounds, revealing a delicate interplay between mixing properties of the logging policy and data set size. Their analysis highlights how achieving optimal statistical risk depends on this trade-off, broadening the scope of off-line estimation under diverse conditions. Examples include ergodic and weakly ergodic chains as well as controlled chains with episodic or greedy controls. Significantly, this research confirms that the widely used estimator, which calculates state–action transition ratios, is minimax optimal, ensuring its reliability in general scenarios. This advancement paves the way for improved evaluation of stationary Markov control policies, marking a breakthrough in understanding complex off-line systems.

马尔可夫链非参数估计样本复杂度控制策略评估机器学习

阅读原文 ↗