基于不确定性感知探索的近端策略优化算法的无人机辅助移动边缘计算与缓存优化

谢键; 于思源; 张旭秀

doi:10.13976/j.cnki.xk.2024.1882

基于不确定性感知探索的近端策略优化算法的无人机辅助移动边缘计算与缓存优化

Drone-assisted Mobile Edge Computing and Caching Optimization Based on Uncertainty-aware Exploration in Proximal Policy Optimization Algorithm

摘要

摘要: 针对传统边缘计算和缓存技术在处理计算密集型和延迟敏感型任务时的不足，提出了一种以无人机(UAV)为核心的主动边缘计算与缓存优化方案。利用UAV主动感知车辆需求，结合二分类数学模型和Hawkes模型，提高对道路车辆需求预测的准确率。将上述问题用马尔可夫决策过程描述，通过对PPO (Pronimal Policy Optimization)算法进行改进，提出了不确定性感知探索的近端策略优化(UAE-PPO)算法，对边缘缓存与卸载进行优化。UAE-PPO算法在Actor网络中集成了不确定性感知探索和动态调整探索策略的方法，结合了自适应衰减clip参数和L₂正则化技术，显著提升了模型的稳定性和泛化能力。仿真实验表明，对比传统PPO算法，所提算法奖励收敛速度提高了28.6%，奖励值提高了6.3%。

Abstract: To address the limitations of traditional edge computing and caching technologies in handling computationally intensive and latency-sensitive tasks, we propose an active edge computing and caching optimization scheme centered on unmanned aerial vehicles (UAVs). The scheme leverages UAVs to actively sense vehicle demands, enhancing the accuracy of road vehicle demand prediction by integrating binary classification mathematical models with Hawkes processes. The problem is formulated as a Markov decision process, and to optimize edge caching and task offloading, an uncertainty-aware exploration proximal policy optimization (UAE-PPO) algorithm is introduced, building upon improvements to the proximal policy optimization (PPO) algorithm. The UAE-PPO algorithm enhances model stability and generalization by incorporating uncertainty-aware exploration and dynamically adjusting exploration strategies within the actor network. Additionally, it integrates adaptive attenuation of the clip parameter and L2 regularization techniques. Simulation results demonstrate that, compared to the traditional PPO algorithm, the proposed UAE-PPO algorithm improves reward convergence speed by 28.6% and increases the reward value by 6.3%.

HTML全文

参考文献(26)

施引文献

资源附件(0)