唐睿, 何祖涵, 张睿智, 岳士博, 庞川林, 何金璞. D2D-NOMA系统中混合离线-在线资源分配机制[J]. 信息与控制, 2023, 52(5): 574-587. DOI: 10.13976/j.cnki.xk.2023.2307
引用本文: 唐睿, 何祖涵, 张睿智, 岳士博, 庞川林, 何金璞. D2D-NOMA系统中混合离线-在线资源分配机制[J]. 信息与控制, 2023, 52(5): 574-587. DOI: 10.13976/j.cnki.xk.2023.2307
TANG Rui, HE Zuhan, ZHANG Ruizhi, YUE Shibo, PANG Chuanlin, HE Jinpu. Hybrid Offline-online Resource Allocation Mechanism for D2D-NOMA Systems[J]. INFORMATION AND CONTROL, 2023, 52(5): 574-587. DOI: 10.13976/j.cnki.xk.2023.2307
Citation: TANG Rui, HE Zuhan, ZHANG Ruizhi, YUE Shibo, PANG Chuanlin, HE Jinpu. Hybrid Offline-online Resource Allocation Mechanism for D2D-NOMA Systems[J]. INFORMATION AND CONTROL, 2023, 52(5): 574-587. DOI: 10.13976/j.cnki.xk.2023.2307

D2D-NOMA系统中混合离线-在线资源分配机制

Hybrid Offline-online Resource Allocation Mechanism for D2D-NOMA Systems

  • 摘要: 针对终端直通(device-to-device,D2D)通信增强的非正交多址接入(non-orthogonal multiple access,NOMA)系统中复杂的同频干扰,联合模式选择和功率控制最大化比例公平和速率,从而平衡频谱效率和用户公平性。针对原混合整数非凸优化问题,提出了一种混合离线-在线资源分配机制:在离线训练阶段,首先通过变量变换将剩余的功率控制子问题等价转化为凸优化问题,继而借助成熟的凸优化工具在毫秒级时间内得到全局最优解。基于上述优化结果,进一步利用深度Q学习算法构建从模式选择方案和信道状态信息到最佳模式调整策略之间的映射关系。训练后的资源分配机制仅需执行简单的代数操作并求解单个凸优化问题,故适合在线部署。仿真结果表明,所提混合离线-在线机制有效平衡了算法性能与运算时间,相比于通过遍历得到的全局最优解,其在仅损失约10%的性能下将平均运算时间降低了94.54%。

     

    Abstract: A device-to-device (D2D) communication-empowered nonorthogonal multiple access system is associated with complex co-channel interference. In this study, we optimize both mode selection and power control in order to maximize the sum proportional bit rate to balance spectral efficiency and user fairness. Accordingly, we propose a hybrid offline-online mechanism to cope with the original mixed-integer non-convex optimization problem. In offline training, variable transformation is used to equivalently transform the power control subproblem into a convex optimization problem. Its global optimum can be readily obtained in milliseconds by using the sophisticated convex optimization toolbox. According to the obtained optimization results, the deep Q-learning algorithm is then applied to build up the mapping relationship from the mode selection scheme and channel state information to the optimal mode adjustment policy. The trained resource allocation mechanism is suitable for online implementation as it involves only simple algebraic operations and a single convex optimization problem. The simulation results show that the proposed mechanism strikes a good balance between performance and operation time. Particularly, it cuts down the average operation time by 94.54% while suffering approximately 10% performance loss compared with the global optimum obtained by the exhausting search.

     

/

返回文章
返回