Abstract:
To address the issues of Q-value underestimation and insufficient exploration efficiency of the Soft Actor-Critic (SAC) algorithm in complex robotic arm path planning, this paper proposes a dual-actor triple-critic cross-update SAC algorithm (DTC-SAC). This method employs a dual-actor structure to enhance policy diversity and improve exploration capability, designs a weighted triple-critic mechanism that combines the maximum and average values to balance value estimation bias, and introduces an actor-critic cross-update strategy to reduce network coupling and improve training stability. Experiments on robotic arm grasping tasks in the ROS-Gazebo simulation environment show that, in obstacle-free environments, DTC-SAC achieves a task success rate of 92.6% and increases the average reward by 17.7% compared to SAC; in obstacle-present environments, the success rate reaches 88.2% with a 20.7% improvement in average reward. In terms of convergence speed, DTC-SAC stabilizes at episode 120, approximately 33.3% earlier than SAC (episode 180). The proposed method effectively improves the success rate, average reward, and convergence efficiency of robotic arm path planning in complex environments, thereby enhancing planning robustness.