基于对抗性深度强化学习的配电网两阶段无功电压控制方法

黄煜; 张晨明; 岳东; 胡松林; 王毅; 谈超

doi:10.13976/j.cnki.xk.2025.1002

基于对抗性深度强化学习的配电网两阶段无功电压控制方法

Two-stage Reactive Voltage Control Method for Distribution Networks Based on Adversarial Deep Reinforcement Learning

摘要

摘要: 新能源在配电系统中的大规模接入导致节点电压波动显著增加，传统的无功电压控制方法存在计算复杂、依赖精确模型参数、难以实时应对负荷扰动等问题。为此，提出了一种基于对抗性近端策略优化（Adversarial Proximal Policy Optimization，APPO）的两阶段深度强化学习方法，以实现配电网无功电压的实时响应控制。首先，将无功电压控制问题转化为对抗性马尔可夫决策过程（Adversarial Markov Decision Process，AMDP），引入主智能体与对抗智能体，主智能体致力于优化控制策略以应对新能源接入带来的不确定性，对抗智能体通过模拟极端负荷扰动以强化主智能体的抗扰鲁棒性。然后，提出结合对抗训练和实时适应的两阶段训练框架：在对抗训练阶段，通过负荷扰动提升主智能体策略的泛化能力；在实时适应阶段，主智能体利用对抗训练阶段得到的网络参数快速适应实时环境。最后，通过改进的IEEE 33节点配电网仿真算例验证了所提方法的有效性，结果表明该方法显著降低了系统网损和电压越限率，并提高了系统在扰动环境下的鲁棒性和自适应能力。

Abstract: The large-scale integration of new energy sources into the distribution system has led to a significant increase in node voltage fluctuations. Traditional reactive power voltage control methods suffer from issues such as computational complexity, reliance on precise model parameters, and difficulty in real-time response to load disturbances. To address this, we propose a two-stage deep reinforcement learning method based on adversarial proximal policy optimization (APPO) to achieve real-time responsive control of reactive voltage in the distribution network. First, we transform the reactive power voltage control problem into an adversarial Markov decision process (AMDP), introducing a main agent and an adversarial agent: The main agent is dedicated to optimizing the control strategy to cope with the uncertainty brought by the integration of new energy sources, while the adversarial agent enhances the disturbance rejection robustness of the main agent by simulating extreme load disturbances. Then, we propose a two-stage training framework combining adversarial training and real-time adaptation: In the adversarial training stage, the generalization ability of the main agent's strategy is improved through load disturbances; In the real-time adaptation stage, the main agent quickly adapts to the real-time environment using the network parameters obtained from the adversarial training stage. Finally, we verify the effectiveness of the proposed method through a simulation example of the improved IEEE 33 node distribution network. The results show that this method significantly reduces the system network loss and voltage violation rate, and improves the robustness and adaptability of the system in disturbed environments.

HTML全文

参考文献(28)

施引文献

资源附件(0)