基于双行动者三评论家的交叉更新SAC机械臂路径规划方法

袁帅; 张宏尧; 樊成杰; 刘哲宇

doi:10.13976/j.cnki.xk.2025.4073

基于双行动者三评论家的交叉更新SAC机械臂路径规划方法

A Robotic Arm Path Planning Method Based on a Cross-update Soft Actor-Critic Algorithm with Dual Actors and Triple Critics

摘要

摘要: 针对软行动者-评论家（SAC）算法在复杂机械臂路径规划中存在的Q值低估及探索效率不足问题，本文提出一种基于双行动者-三评论家的交叉更新SAC算法（DTC-SAC）。该方法采用双行动者结构增强策略多样性以提升探索能力，设计三重评论家加权机制融合最大值与平均值以平衡价值估计偏差，并引入行动者-评论家交叉更新策略以降低网络耦合、提高训练稳定性。在ROS-Gazebo仿真环境中进行机械臂抓取实验，结果表明：无障碍环境下，DTC-SAC的任务成功率达92.6%，平均奖励较SAC提升17.7%；有障碍环境下成功率达88.2%，平均奖励提升20.7%；收敛速度方面，DTC-SAC在第120回合达到稳定，较SAC（第180回合）提前约33.3%。所提方法有效提升了复杂环境下机械臂路径规划的成功率、平均奖励及收敛效率，增强了规划鲁棒性。

Abstract: To address the issues of Q-value underestimation and insufficient exploration efficiency of the Soft Actor-Critic (SAC) algorithm in complex robotic arm path planning, this paper proposes a dual-actor triple-critic cross-update SAC algorithm (DTC-SAC). This method employs a dual-actor structure to enhance policy diversity and improve exploration capability, designs a weighted triple-critic mechanism that combines the maximum and average values to balance value estimation bias, and introduces an actor-critic cross-update strategy to reduce network coupling and improve training stability. Experiments on robotic arm grasping tasks in the ROS-Gazebo simulation environment show that, in obstacle-free environments, DTC-SAC achieves a task success rate of 92.6% and increases the average reward by 17.7% compared to SAC; in obstacle-present environments, the success rate reaches 88.2% with a 20.7% improvement in average reward. In terms of convergence speed, DTC-SAC stabilizes at episode 120, approximately 33.3% earlier than SAC (episode 180). The proposed method effectively improves the success rate, average reward, and convergence efficiency of robotic arm path planning in complex environments, thereby enhancing planning robustness.

HTML全文

参考文献(30)

施引文献

资源附件(0)