郭子恒, 蔡晨晓. 基于改进深度强化学习的无人机自主导航方法[J]. 信息与控制, 2023, 52(6): 736-746, 772. DOI: 10.13976/j.cnki.xk.2022.0447
引用本文: 郭子恒, 蔡晨晓. 基于改进深度强化学习的无人机自主导航方法[J]. 信息与控制, 2023, 52(6): 736-746, 772. DOI: 10.13976/j.cnki.xk.2022.0447
GUO Ziheng, CAI Chenxiao. Autonomous Navigation Algorithm of UAV Based on Improved Deep-reinforcement-learning[J]. INFORMATION AND CONTROL, 2023, 52(6): 736-746, 772. DOI: 10.13976/j.cnki.xk.2022.0447
Citation: GUO Ziheng, CAI Chenxiao. Autonomous Navigation Algorithm of UAV Based on Improved Deep-reinforcement-learning[J]. INFORMATION AND CONTROL, 2023, 52(6): 736-746, 772. DOI: 10.13976/j.cnki.xk.2022.0447

基于改进深度强化学习的无人机自主导航方法

Autonomous Navigation Algorithm of UAV Based on Improved Deep-reinforcement-learning

  • 摘要: 深度强化学习算法在实现无人机(unmanned aerial vehicle,UAV)导航任务的应用越来越广泛。然而,利用融合先验策略的训练过程中,由于其占比线性衰减,导致模型训练速度缓慢,导航成功率下降。针对以上问题,本文提出一种UAV导航算法。首先,搭建虚拟UAV环境模型,构造动作空间。其次,依据稀疏化思想,设计奖励函数,根据UAV的学习状态设计自适应衰减因子,改进不同学习状态下先验策略的比重,训练网络模型。最后,依据训练好的网络模型实现UAV导航决策。仿真结果显示,所提算法导航成功率稳定处于较高水平时的训练时间比原型算法缩短了大约20%,大幅度提升了网络训练效率,降低时间成本,同时由于先验策略比重更加贴合当时的学习能力,UAV导航质量与成功率也有了一定提高,为推动深度强化学习在UAV导航的实际应用提供了新的思路。

     

    Abstract: The deep reinforcement learning algorithm is widely used in UAV navigation tasks. However, in the training process using the fusion prior strategy, the model training speed is slow, and the success rate of navigation decreases due to the linear attenuation of its proportion. First, we establish a virtual UAV environment model and construct the action space based on UAV autonomous navigation. Next, we design the reward function built on the nonsparsity idea. Coupled with the self-adaptive attenuation factor based on state, the weight of prior policy under the different states is ameliorated. Finally, we realize the autonomous navigation decision-making of UAVs using the trained network model. Simulation results manifest that the training time when the navigation success rate is stable at a high level is reduced by 20% from the prototype algorithm, indicating that we increase the training efficiency and cut down the time cost. In addition, the navigational quality and success rate are slightly enhanced. The proposed algorithm provides a new idea to facilitate the practical use of deep reinforcement learning in UAV autonomous navigation.

     

/

返回文章
返回