分布式多经验池的无人机自主避碰方法

徐佳; 胡春鹤

doi:10.13976/j.cnki.xk.2023.2188

分布式多经验池的无人机自主避碰方法

徐佳,
胡春鹤

Autonomous Collision Avoidance Method of UAV Based on Distributed Multi-experience Pool

XU Jia,
HU Chunhe

摘要

摘要: 为满足多无人机（multi-UAVs）的协同任务中高效自主避碰的需求，在基于数据驱动的强化学习方法的基础上，提出了一种分布式多经验池深度确定性策略梯度避碰方法（DMEP-DDPG），使单个无人机在多机环境下仅依靠自身传感数据即可自主避碰作业。首先，针对强化学习任务在长周期下的稀疏回报问题，设计了基于引导型奖励函数系统回报机制；其次，为克服单一经验池样本效率低带来的策略收敛困难的问题，构建了新型的分布式多经验池更新的确定性策略梯度框架；最后，在多种多无人机协同任务环境中测试了DMEP-DDPG方法的避碰性能，并与其它基于学习的避碰策略进行了性能指标对比，结果验证了DMEP-DDPG方法的可行性和有效性。

Abstract: In this study, we propose a distributed multi-experience pool collision avoidance method with a deep deterministic policy gradient (DMEP-DDPG) to meet the demand for efficient autonomous collision avoidance in cooperative tasks of multiple unmanned aerial vehicles (multi-UAVs). The proposed method is based on data-driven reinforcement learning methods and enables a single UAV to rely on its own sensor data for autonomous collision avoidance operations in a multi-UAV environment. For this, we first design a bootstrap reward function-based system payoff mechanism to address the sparse reward problem of reinforcement learning tasks with long periods. Since the low sample efficiency of a single experience pool results in difficult policy convergence, we construct a novel distributed multi-experience pool updated deterministic policy gradient framework to overcome this problem. Finally, we test the collision avoidance performance of the DMEP-DDPG method in multi-UAV collaborative missions and compare performance metrics with other learning-based collision avoidance strategies. Our experimental results verify the feasibility and effectiveness of the DMEP-DDPG method.

HTML全文

参考文献(27)

施引文献

资源附件(0)