Policy Optimization in Dynamic Bayesian Network Hybrid Models of Biomanufacturing Processes
针对生物制药过程中数据稀缺、因素相互依赖的问题,提出了基于动态贝叶斯网络的模型强化学习框架,实现低数据环境下的类人控制,并用随机梯度方法优化策略。
Biopharmaceutical manufacturing is a rapidly growing industry with impact in virtually all branches of medicine. Biomanufacturing processes require close monitoring and control, in the presence of complex bioprocess dynamics with many interdependent factors, as well as extremely limited data due to the high cost of experiments and the novelty of personalized bio-drugs. We develop a new model-based reinforcement learning framework that can achieve human-level control in low-data environments. A dynamic Bayesian network is used to capture causal interdependencies between factors and predict how the effects of different inputs propagate through the pathways of the bioprocess mechanisms. This model is interpretable and enables the design of process control policies that are robust against model risk. We present a computationally efficient, provably convergent stochastic gradient method for optimizing such policies. Validation is conducted on a realistic application with a multidimensional, continuous state variable. History: Accepted by Bruno Tuffin, Area Editor for Simulation. Funding: This work was partially supported by National Institute of Standards and Technology [Grant 70NANB17H002]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/ijoc.2022.1232 .