基于状态抽象的柔性动作评价边缘计算随机任务卸载策略

Soft Actor Critic with State Abstraction for Random Task Offloading Strategy in Edge Computing

  • 摘要: 为了使移动边缘计算卸载算法在计算时延、计算能耗和算法适应性方面取得更好性能,构建了端-边协同架构下的任务模型和通信模型,并对卸载模型进行了改进;同时,设计了一种新型的按比例卸载的动作函数,并结合负向惩罚机制优化奖励函数,以有效增强模型对真实环境的表征能力,从而为深度强化学习算法通过累积奖励机制求解最优策略提供更为合理的目标导向。针对柔性动作评价(SAC)算法在卸载模型中奖励稀疏问题,提出一种融合状态抽象机制的状态抽象SAC(SACSA)边缘计算随机任务比例卸载策略。通过仿真实验,验证了所设计的按比例卸载动作函数和带负向惩罚机制的奖励函数的有效性;并将SACAS算法与SAC、NECSA(neural episodic control with state abstraction)、PPO(proximal policy optimization)、TD3(twin delayed deep deterministic policy gradient)等算法进行了性能对比分析。结果表明,SACSA有更好的表现,在不同工况下任务延迟降低幅度为1.64%~85.35%,任务达成率提升幅度为0.55%~69.64%,回合奖励提升幅度为0.53%~75.8%。

     

    Abstract: To enhance the performance of the mobile edge computing offloading algorithm in terms of computational delay, computational energy consumption, and algorithmic adaptability, we construct task and communication models under an end-edge cooperative architecture and improve the offloading model. We design a novel action function for proportional offloading and optimize the reward function by incorporating a negative penalty mechanism. These enhancements effectively strengthen the model's ability to characterize the real environment, thereby providing a more reasonable objective for deep reinforcement learning algorithms to find the optimal policy through cumulative rewards. Specifically, to address the sparse reward problem of the soft actor critic (SAC) algorithm in the offloading model, we propose a state abstraction-enhanced SAC (SACSA) strategy for proportional task offloading in edge computing. Through simulation experiments, we verify the effectiveness of the proportional offloading action function and the reward function with the negative penalty mechanism. We conduct a comparative performance analysis with SAC, neural episodic control with state abstraction (NECSA), proximal policy optimization (PPO), and twin delayed deep deterministic policy gradient (TD3). The results demonstrate that our proposed SACSA algorithm achieves superior performance across diverse scenarios: it reduces task latency by 1.64%~85.35%, improves the task completion rate by 0.55%~69.64%, and increases the episode reward by 0.53%~75.8%.

     

/

返回文章
返回