Quantifying capability gaps via information relaxation and deep reinforcement learning in infinite-horizon Markov decision processes: A military air battle management application
提出一种信息松弛技术,用于量化复杂随机动态分配问题中解质量的上界,通过深度神经网络近似策略迭代求解,帮助军事空战管理评估能力差距。
This paper presents a novel application of information relaxation techniques to quantify upper bounds on solution quality in a complex, stochastic, and dynamic assignment problem in military air battle management. Information relaxation refers to relaxing the non-anticipativity constraints in a sequential decision-making problem that require a decision-maker to act only on currently available information. We introduce a temporal event horizon—–an adjustable window into future stochastic outcomes—–to explore the marginal value of information in shaping decision policies. Whereas previous work has investigated information relaxation with regard to problems that can be solved more easily under a deterministic relaxation, we demonstrate a methodology for applying the approach to a continuous-time, continuous-space problem that remains computationally challenging even after relaxation. We formulate the problem as a discounted, infinite-horizon Markov decision process and solve it by employing a deep neural network-based approximate policy iteration algorithm in concert with several designed computational experiments. We demonstrate how a multidimensional sensitivity analysis of the event horizon and other problem features helps quantify potential improvements to decision policy effectiveness resulting from either a change to tactics or a modification to capabilities. Our findings provide a methodology for objective, data-driven insights that can augment traditionally subjective capability gap analysis to guide decision-making and establish more effective requirements for acquisition programs.