通过模拟进行离线序贯学习

Offline sequential learning via simulation

IISE Transactions · 2021

被引 4

ABS 3

Hui Xiao 通讯
Haitao Liu
Haobin Li
Loo Hay Lee
Ek Peng Chew

中文导读

提出一种信息梯度策略，通过离线模拟序贯分配预算，学习场景、决策与二元结果的关系，以识别所有可行决策，适用于在线决策场景。

Abstract

Simulation has been widely used for static system designs, but it is rarely used in making online decisions, due to the time delay of executing simulation. We consider a system with stochastic binary outcomes that can be predicted via a logistic model depending on scenarios and decisions. The goal is to identify all feasible decisions conditioning on any online scenario. We propose to learn offline the relationship among scenarios, decisions, and binary outcomes. An Information Gradient (IG) policy is developed to sequentially allocate offline simulation budget. We show that the maximum likelihood estimator produced via the IG policy is consistent and asymptotically normal. Numerical results on synthetic data and a case study demonstrate the superior performance of the IG policy than benchmark policies. Moreover, we find that the IG policy tends to sample the location near boundaries of the design space, due to its higher Fisher information, and that the time complexity of the IG policy is linear to the number of design points and simulation budget.

模拟序贯决策逻辑回归信息梯度实验设计

阅读原文 ↗