面向六自由度无人作战飞机空战的分层深度强化学习框架

A Hierarchical Deep Reinforcement Learning Framework for 6-DOF UCAV Air-to-Air Combat

IEEE Transactions on Systems, Man, and Cybernetics: Systems · 2023

被引 74 · 同刊同年前 2%

ABS 3

Jiajun Chai
Wenzhang Chen
Yuanheng Zhu
Zongxin Yao
Dongbin Zhao

中文导读

提出一个分层强化学习框架，将六自由度空战决策分为外环战术决策和内环控制执行，通过两阶段训练和虚拟自对弈机制提升胜率，适用于无人作战飞机视距内空战。

Abstract

Unmanned combat air vehicle (UCAV) combat is a challenging scenario with high-dimensional continuous state and action space and highly nonlinear dynamics. In this article, we propose a general hierarchical framework to resolve the within-vision-range (WVR) air-to-air combat problem under six dimensions of degree (6-DOF) dynamics. The core idea is to divide the whole decision-making process into two loops and use reinforcement learning (RL) to solve them separately. The outer loop uses a combat policy to decide the macro command according to the current combat situation. Then the inner loop uses a control policy to answer the macro command by calculating the actual input signals for the aircraft. We design the Markov decision-making process for the control policy and the Markov game between two aircraft. We present a two-stage training mechanism. For the control policy, we design an effective reward function to accurately track various macro behaviors. For the combat policy, we present a fictitious self-play mechanism to improve the combat performance by combating against the historical combat policies. Experiment results show that the control policy can achieve better tracking performance than conventional methods. The fictitious self-play mechanism can learn competitive combat policy, which can achieve high winning rates against conventional methods.

强化学习空战无人作战飞机马尔可夫决策过程人工智能

阅读原文 ↗