Bayesian safe policy learning with chance constrained optimization: application to military security assessment during the Vietnam War
研究了越南战争期间部署的安全评估算法能否通过短期结果改进,提出了平均条件风险(ACRisk)和贝叶斯策略学习框架,在约束预期风险的同时最大化后验预期结果,适用于高风险的算法决策领域。
Abstract Algorithmic decisions are increasingly used in high-stake domains such as criminal justice and medicine. We examine whether a security assessment algorithm deployed during the Vietnam War could have been improved using outcomes observed shortly after its rollout in late 1969. This empirical setting highlights methodological challenges common in real-world algorithmic decision-making. First, a new algorithm must be carefully evaluated to avoid worsening outcomes relative to an existing algorithm. Second, because the existing algorithm is deterministic, learning improvements require extrapolation that is both transparent and credible. Third, the use of discrete decision tables complicates optimization. We introduce the Average Conditional Risk (ACRisk), which quantifies the risk that a new algorithm performs worse for subgroups of units and then averages this risk across the population. We also develop a Bayesian policy learning framework that maximizes the posterior expected outcome while constraining the expected ACRisk. This approach separates treatment effect estimation from policy optimization, allowing flexible modelling and tractable search over complex policy classes. We show that this leads to a constrained linear programming problem. Applying our method, the learned algorithm rates most regions as more secure and assigns greater emphasis on economic and political factors over military ones.