张磊, 母亚双, 潘泉. 基于改进深度双Q网络的移动机器人路径规划算法[J]. 信息与控制, 2024, 53(3): 365-376. DOI: 10.13976/j.cnki.xk.2024.3090
引用本文: 张磊, 母亚双, 潘泉. 基于改进深度双Q网络的移动机器人路径规划算法[J]. 信息与控制, 2024, 53(3): 365-376. DOI: 10.13976/j.cnki.xk.2024.3090
ZHANG Lei, MU Yashuang, PAN Quan. Mobile Robot Path Planning Algorithm with Improved Deep Double Q Networks[J]. INFORMATION AND CONTROL, 2024, 53(3): 365-376. DOI: 10.13976/j.cnki.xk.2024.3090
Citation: ZHANG Lei, MU Yashuang, PAN Quan. Mobile Robot Path Planning Algorithm with Improved Deep Double Q Networks[J]. INFORMATION AND CONTROL, 2024, 53(3): 365-376. DOI: 10.13976/j.cnki.xk.2024.3090

基于改进深度双Q网络的移动机器人路径规划算法

Mobile Robot Path Planning Algorithm with Improved Deep Double Q Networks

  • 摘要: 针对传统的基于深度双Q学习网络(DDQN) 的移动机器人路径规划方法在复杂未知环境中面临的搜索不彻底、收敛速度慢等问题, 提出了一种改进的深度双Q网络学习算法(improved deep double Q-network, I-DDQN)。首先, 利用竞争网络结构对DDQN算法的值函数进行估计。然后, 提出了一种基于双层控制器结构的机器人路径探索策略, 其中上层控制器的价值函数用于移动机器人局部最优动作的探索, 下层控制器的价值函数用于全局任务策略的学习; 同时在算法学习过程中使用优先经验回放机制进行数据收集和采样, 并使用小批量数据进行网络训练。最后, 分别在OpenAI Gym和Gazebo两种不同的仿真环境下与传统的DDQN算法及其改进算法进行了对比分析。实验结果表明, 所提的I-DDQN算法在两种仿真环境下的多种评价指标上都优于传统的DDQN算法及其改进算法, 在相同复杂环境中能有效克服路径搜索不彻底、收敛速度慢等问题。

     

    Abstract: To solve the problems of the conventional mobile robot path planning method based on the deep double Q-network (DDQN), such as incomplete search and slow convergence, we propose an improved DDQN (I-DDQN) learning algorithm. First, the proposed I-DDQN algorithm uses the competitive network structure to estimate the value function of the DDQN algorithm. Second, we propose a robot path exploration strategy based on a two-layer controller structure, where the value function of the upper controller is used to explore the local optimal action of the mobile robot and the value function of the lower controller is used to learn the global task strategy. In addition, during algorithm learning, we use the priority experience playback mechanism for data collection and sampling and the small-batch data for network training. Finally, we perform a comparative analysis with the conventional DDQN algorithm and its improved algorithm in two different simulation environments, OpenAI Gym and Gazebo. The experimental results show that the proposed I-DDQN algorithm is superior to the conventional DDQN algorithm and its improved algorithm in terms of various evaluation indicators in the two simulation environments and effectively overcomes the problems of incomplete path search and slow convergence speed in the same complex environment.

     

/

返回文章
返回