Data-Pooling Reinforcement Learning for Preventative Healthcare Intervention
针对预防性医疗中异质人群的个性化干预需求,提出一种数据池化强化学习算法,通过自适应池化历史数据(仅需汇总统计量)来应对小样本问题,理论保证遗憾界降低,并在出院后干预案例中验证效果。
Motivated by the emerging needs of tailored intervention for heterogeneous populations in preventative healthcare applications, we consider a multistage, dynamic decision-making problem in the online setting with unknown model parameters. To deal with the pervasive issue of small sample size in parameter estimation, we develop a novel data-pooling reinforcement learning (RL) algorithm that adaptively pools historical data, with three main innovations: (i) the weight of pooling is directly tied to the performance of the decision (measured by regret) as opposed to estimation accuracy in conventional methods; (ii) no parametric assumptions are needed between historical and current data; and (iii) requiring data-sharing only via aggregate statistics, as opposed to patient-level data. Our data-pooling algorithm framework unifies a variety of RL algorithms, and we establish a theoretical performance guarantee that quantifies the regret bound reduction from our adaptive pooling design compared with benchmarks. We substantiate the theoretical development with extensive empirical experiments via a case study in the context of postdischarge intervention to prevent unplanned readmissions, generating practical insights into healthcare management. Our algorithm enables organizations to leverage public data or published studies for patient management and supports policymakers in promoting aggregate data sharing to enhance population health outcomes. This paper was accepted by Carri Chan, healthcare management. Funding: X. Chen was supported by National Natural Science Foundation of China [Grants NSFC-72171205 and NSFC-72394361] and Shenzhen Science and Technology Innovations Committee [Grant RCXY 20210609103124047]. Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2023.03880 .