基于双行动者三评论家的交叉更新SAC机械臂路径规划方法

A Robotic Arm Path Planning Method Based on a Cross-update Soft Actor-Critic Algorithm with Dual Actors and Triple Critics

  • 摘要: 针对软行动者-评论家(SAC)算法在复杂机械臂路径规划中存在的Q值低估及探索效率不足问题,本文提出一种基于双行动者-三评论家的交叉更新SAC算法(DTC-SAC)。该方法采用双行动者结构增强策略多样性以提升探索能力,设计三重评论家加权机制融合最大值与平均值以平衡价值估计偏差,并引入行动者-评论家交叉更新策略以降低网络耦合、提高训练稳定性。在ROS-Gazebo仿真环境中进行机械臂抓取实验,结果表明:无障碍环境下,DTC-SAC的任务成功率达92.6%,平均奖励较SAC提升17.7%;有障碍环境下成功率达88.2%,平均奖励提升20.7%;收敛速度方面,DTC-SAC在第120回合达到稳定,较SAC(第180回合)提前约33.3%。所提方法有效提升了复杂环境下机械臂路径规划的成功率、平均奖励及收敛效率,增强了规划鲁棒性。

     

    Abstract: To address the issues of Q-value underestimation and insufficient exploration efficiency of the Soft Actor-Critic (SAC) algorithm in complex robotic arm path planning, this paper proposes a dual-actor triple-critic cross-update SAC algorithm (DTC-SAC). This method employs a dual-actor structure to enhance policy diversity and improve exploration capability, designs a weighted triple-critic mechanism that combines the maximum and average values to balance value estimation bias, and introduces an actor-critic cross-update strategy to reduce network coupling and improve training stability. Experiments on robotic arm grasping tasks in the ROS-Gazebo simulation environment show that, in obstacle-free environments, DTC-SAC achieves a task success rate of 92.6% and increases the average reward by 17.7% compared to SAC; in obstacle-present environments, the success rate reaches 88.2% with a 20.7% improvement in average reward. In terms of convergence speed, DTC-SAC stabilizes at episode 120, approximately 33.3% earlier than SAC (episode 180). The proposed method effectively improves the success rate, average reward, and convergence efficiency of robotic arm path planning in complex environments, thereby enhancing planning robustness.

     

/

返回文章
返回