徐佳, 胡春鹤. 分布式多经验池的无人机自主避碰方法[J]. 信息与控制, 2023, 52(4): 432-443. DOI: 10.13976/j.cnki.xk.2023.2188
引用本文: 徐佳, 胡春鹤. 分布式多经验池的无人机自主避碰方法[J]. 信息与控制, 2023, 52(4): 432-443. DOI: 10.13976/j.cnki.xk.2023.2188
XU Jia, HU Chunhe. Autonomous Collision Avoidance Method of UAV Based on Distributed Multi-experience Pool[J]. INFORMATION AND CONTROL, 2023, 52(4): 432-443. DOI: 10.13976/j.cnki.xk.2023.2188
Citation: XU Jia, HU Chunhe. Autonomous Collision Avoidance Method of UAV Based on Distributed Multi-experience Pool[J]. INFORMATION AND CONTROL, 2023, 52(4): 432-443. DOI: 10.13976/j.cnki.xk.2023.2188

分布式多经验池的无人机自主避碰方法

Autonomous Collision Avoidance Method of UAV Based on Distributed Multi-experience Pool

  • 摘要: 为满足多无人机(multi-UAVs)的协同任务中高效自主避碰的需求,在基于数据驱动的强化学习方法的基础上,提出了一种分布式多经验池深度确定性策略梯度避碰方法(DMEP-DDPG),使单个无人机在多机环境下仅依靠自身传感数据即可自主避碰作业。首先,针对强化学习任务在长周期下的稀疏回报问题,设计了基于引导型奖励函数系统回报机制;其次,为克服单一经验池样本效率低带来的策略收敛困难的问题,构建了新型的分布式多经验池更新的确定性策略梯度框架;最后,在多种多无人机协同任务环境中测试了DMEP-DDPG方法的避碰性能,并与其它基于学习的避碰策略进行了性能指标对比,结果验证了DMEP-DDPG方法的可行性和有效性。

     

    Abstract: In this study, we propose a distributed multi-experience pool collision avoidance method with a deep deterministic policy gradient (DMEP-DDPG) to meet the demand for efficient autonomous collision avoidance in cooperative tasks of multiple unmanned aerial vehicles (multi-UAVs). The proposed method is based on data-driven reinforcement learning methods and enables a single UAV to rely on its own sensor data for autonomous collision avoidance operations in a multi-UAV environment. For this, we first design a bootstrap reward function-based system payoff mechanism to address the sparse reward problem of reinforcement learning tasks with long periods. Since the low sample efficiency of a single experience pool results in difficult policy convergence, we construct a novel distributed multi-experience pool updated deterministic policy gradient framework to overcome this problem. Finally, we test the collision avoidance performance of the DMEP-DDPG method in multi-UAV collaborative missions and compare performance metrics with other learning-based collision avoidance strategies. Our experimental results verify the feasibility and effectiveness of the DMEP-DDPG method.

     

/

返回文章
返回