基于基权重向量的动态学习与决策

Dynamic Learning and Decision Making via Basis Weight Vectors

Operations Research · 2022

被引 5

人大 AFT50UTD24ABS 4*

Hao Zhang · 不列颠哥伦比亚大学通讯

中文导读

提出一种基于基权重向量的新方法，通过纯逆向归纳处理连续未知参数的学习与决策问题，在短学习期下优于流行算法。

Abstract

A New Method for Dynamic Learning and Doing For a large class of learning-and-doing problems, two processes are intertwined in the analysis: a forward process that updates the decision maker’s belief or estimate of the unknown parameter, and a backward process that computes the expected future values. The mainstream literature focuses on the former process. In contrast, in “Dynamic Learning and Decision Making via Basis Weight Vectors,” Hao Zhang proposes a new method based on pure backward induction on the continuation values created by feasible continuation policies. When the unknown parameter is a continuous variable, the method represents each continuation-value function by a vector of weights placed on a set of basis functions. The weight vectors that are potentially useful for the optimal solution can be found backward in time exactly (for very small problems) or approximately (for larger problems). A simulation study demonstrates that an approximation algorithm based on this method outperforms some popular algorithms in the linear contextual bandit literature when the learning horizon is short.

动态规划学习与决策强化学习运筹学

阅读原文 ↗