基于改进强化学习的多阶段动态武器目标分配

邹尧; 刘天娇; 吕旭; 张艳玲; 郭文达

doi:10.13976/j.cnki.xk.2025.3302

基于改进强化学习的多阶段动态武器目标分配

Multi-Stage Dynamic Weapon-Target Assignment Based on Improved Reinforcement Learning

摘要

摘要: 针对现代作战环境中多波次、高动态的武器目标分配问题，融合了武器可用时间、武器冷却间隔及目标打击时间窗等实际调度限制，构建了一个同时考虑目标威胁消除、基地保卫与阶段间资源协调的多目标优化函数，用于提升调度策略的前瞻性与整体效益。在算法设计上，提出了一种改进Actor-Critic算法，将指针网络融入Actor模块，利用其注意力机制与动态掩码能力，实现了在变长输入下对武器目标对的高效筛选，并采用双通道编码－解码机制，提升了算法的求解精度。实验部分涵盖了目标函数有效性验证、算法性能对比分析、泛化能力测试及注意力机制与正则化模块的消融分析，验证了所提模型与算法在复杂动态环境中的优化能力与适应性。

Abstract: The dynamic weapon-target assignment (DWTA) problem in modern combat environments, characterized by multiple waves and high dynamics, is tackled by incorporating practical scheduling constraints such as weapon availability time, cooldown intervals, and target engagement time windows. To address this problem, we construct a multi-objective optimization model to simultaneously consider threat elimination, base defense, and inter-phase resource coordination, aiming to improve the foresight and overall performance of the scheduling strategy. In terms of algorithm design, we propose an improved Actor-Critic algorithm, where a pointer network is integrated into the Actor module, leveraging its attention mechanism and dynamic masking capability to efficiently select weapon-target pairs under variable-length input conditions. Furthermore, we employ a dual-channel encoding-decoding mechanism to enhance the solution accuracy of the algorithm. The experimental section includes validation of the objective function, performance comparison with baseline algorithms, generalization tests, and ablation analysis of the attention mechanism and regularization components. The results validate the superior optimization capability and adaptability of the proposed model and algorithm in complex dynamic environments.

HTML全文

参考文献(25)

施引文献

资源附件(0)