A Nonparametric Learning Algorithm for a Stochastic Multi-echelon Inventory Problem
针对需求分布未知的多级库存问题,提出基于随机梯度下降的非参数算法,证明其期望遗憾上界与下界匹配,为决策者提供无需先验信息的自适应订货策略。
We consider a periodic-review single-product multi-echelon inventory problem with instantaneous replenishment. In each period, the decision-maker makes ordering decisions for all echelons. Any unsatisfied demand is back-ordered, and any excess inventory is carried to the next period. In contrast to the classic inventory literature, we assume that the information of the demand distribution is not known a priori, and the decision-maker observes demand realizations over the planning horizon. We propose a nonparametric algorithm that generates a sequence of adaptive ordering decisions based on the stochastic gradient descent method. We compare the [Formula: see text]-period cost of our algorithm to the clairvoyant, who knows the underlying demand distribution in advance, and we prove that the expected [Formula: see text]-period regret is at most [Formula: see text], matching a lower bound for this problem.