状态抽象SAC的边缘计算随机任务卸载策略

SAC with State Abstraction Randomized Scaled Offloading Strategy for Edge Computational Offloading

  • 摘要: 为了让移动边缘计算卸载算法在计算时延、计算能耗和算法适应性方面具备更好表现,构建了端-边协同架构下的任务模型和通信模型,并对卸载模型进行了改进,设计了一种新型按比例卸载的动作函数,结合负向惩罚机制优化奖励函数,有效增强了模型对真实环境的表征能力,为深度强化学习算法通过累积奖励机制求解最优策略提供更为合理的目标导向。针对柔性动作评价(Soft Actor Critic,SAC)算法在卸载模型中奖励稀疏问题,提出一种融合状态抽象机制的状态抽象SAC(Soft Actor Critic with State Abstraction,SACSA)边缘计算随机任务比例卸载策略。通过仿真实验,对按比例卸载动作函数和带负向惩罚机制的奖励函数的有效性进行了验证;与SAC、NECSA(Neural Episodic Control with State Abstraction)、PPO(Proximal Policy Optimization)、TD3(Twin Delayed Deep Deterministic Policy Gradient)进行了性能对比分析,SACSA有更好的表现,在不同工况下任务延迟提升幅度为1.64%~85.35%,任务达成率提升幅度为0.55%~69.64%,回合奖励提升幅度为0.53%~75.8%。

     

    Abstract: To enhance the performance of the mobile edge computing offloading algorithm in terms of computational delay, computational energy consumption, and algorithmic adaptability, a soft actor critic with state abstraction (SACSA) scaled offloading strategy for edge computing is proposed. We have constructed the task model and the communication model within the context of an end-edge cooperative architecture. The offloading model is improved, and a novel action function for proportional offloading is designed. This model incorporates a negative punishment mechanism to optimize the reward function, effectively enhancing the model's representational ability for the real environment. It provides a more reasonable target orientation for deep reinforcement learning algorithms to solve for the optimal strategy through the cumulative reward mechanism. Specifically, to address the challenge of sparse reward in the offloading model with the soft actor critic algorithm, we have devised a method to compute intrinsic rewards with state abstraction. Through simulation experiments, the effectiveness of the action function for proportional offloading and the reward function with a negative punishment mechanism has been verified. A comparative performance analysis against SAC, Neural Episodic Control with State Abstraction (NECSA), Proximal Policy Optimization (PPO) and Twin Delayed Deep Deterministic Policy Gradient (TD3) demonstrates that SACSA outperforms these algorithms significantly. Across diverse operational scenarios, SACSA achieves notable improvements: 1.64%~85.35% reduction in task latency, 0.55%~69.64% enhancement in task completion rate, and 0.53%~75.8% enhancement in episode reward.

     

/

返回文章
返回