Offline sequential learning via simulation
提出一种信息梯度策略,通过离线模拟序贯分配预算,学习场景、决策与二元结果的关系,以识别所有可行决策,适用于在线决策场景。
Simulation has been widely used for static system designs, but it is rarely used in making online decisions, due to the time delay of executing simulation. We consider a system with stochastic binary outcomes that can be predicted via a logistic model depending on scenarios and decisions. The goal is to identify all feasible decisions conditioning on any online scenario. We propose to learn offline the relationship among scenarios, decisions, and binary outcomes. An Information Gradient (IG) policy is developed to sequentially allocate offline simulation budget. We show that the maximum likelihood estimator produced via the IG policy is consistent and asymptotically normal. Numerical results on synthetic data and a case study demonstrate the superior performance of the IG policy than benchmark policies. Moreover, we find that the IG policy tends to sample the location near boundaries of the design space, due to its higher Fisher information, and that the time complexity of the IG policy is linear to the number of design points and simulation budget.