Sequential Bayesian Replacement With Unknown Transition Probabilities
研究了在状态转移概率未知时,决策者如何通过信息获取同时学习概率并决定系统替换时机,以最小化有限时间内的期望总折现成本。
ABSTRACT We investigate a class of sequential replacement optimization problems where transition probabilities of states are not known a priori but need to be learned over time via information acquisition. The goal of the decision‐maker is to determine the optimal time to replace a system before it fails, while simultaneously learning the unknown transition probabilities to minimize the expected total discounted costs over the finite time horizon. Specifically, in each period, the decision‐maker needs to choose between do‐nothing and replacement actions. On the one hand, from the perspective of learning, the do‐nothing action is favorable since state transitions can be observed, providing informative insights for learning. On the other hand, from the viewpoint of cost optimization, the decision‐maker may prefer immediate system replacement to avoid potential failure. We formulate the sequential replacement problem as a Bayesian dynamic program (BDP). The primary focus of our analysis is on establishing structural properties that hold in the presence of model uncertainty. In particular, we prove that (i) the optimal policy follows a threshold policy with respect to the system's health state, and (ii) the optimal value function maintains concavity with respect to the posterior distribution. Lastly, we develop two heuristics based on the approximation of the optimal value function and demonstrate that the proposed heuristics are near‐optimal.