史洪岩, 付国城, 潘多涛. 基于近端策略优化和广义状态相关探索算法的双连续搅拌反应釜系统跟踪控制[J]. 信息与控制, 2023, 52(3): 343-351. DOI: 10.13976/j.cnki.xk.2023.2263
引用本文: 史洪岩, 付国城, 潘多涛. 基于近端策略优化和广义状态相关探索算法的双连续搅拌反应釜系统跟踪控制[J]. 信息与控制, 2023, 52(3): 343-351. DOI: 10.13976/j.cnki.xk.2023.2263
SHI Hongyan, FU Guocheng, PAN Duotao. Two-CSTR System Tracking Control Based on PPO-gSDE Algorithm[J]. INFORMATION AND CONTROL, 2023, 52(3): 343-351. DOI: 10.13976/j.cnki.xk.2023.2263
Citation: SHI Hongyan, FU Guocheng, PAN Duotao. Two-CSTR System Tracking Control Based on PPO-gSDE Algorithm[J]. INFORMATION AND CONTROL, 2023, 52(3): 343-351. DOI: 10.13976/j.cnki.xk.2023.2263

基于近端策略优化和广义状态相关探索算法的双连续搅拌反应釜系统跟踪控制

Two-CSTR System Tracking Control Based on PPO-gSDE Algorithm

  • 摘要: 连续搅拌反应釜(continuous stirring tank reactor,CSTR)是经典的化工设备,被广泛应用于化工过程。由于其具有较强的非线性和时滞性,传统的控制方法无法满足其跟踪控制的精度要求。针对连续搅拌反应釜提出一种基于广义状态相关探索(generalized state-dependent exploration,gSDE)的近端策略优化(proximal policy optimization,PPO)算法的跟踪控制方法。首先使用机理模型模拟真实环境与PPO智能体进行交互;其次利用gSDE使每个回合的探索更稳定且方差更小,同时保证了探索的效果;最后通过增加反馈奖励的方式,解决环境稀疏奖励的问题,使得智能体学会如何对CSTR进行跟踪控制。将该算法应用于双CSTR系统进行测试。仿真结果表明,该算法对复杂非线性系统的跟踪控制具有训练过程平稳、控制误差小、对干扰的反应迅速等优势。

     

    Abstract: A continuous stirring tank reactor (CSTR) is classic chemical equipment widely used in chemical processes. The traditional control methods fail to meet the precision requirements of tracking control because of their strong nonlinearity and associated time delay. Thus, in this study, we propose a tracking control method of proximal policy optimization (PPO) based on generalized state-dependent exploration (gSDE) for continuous stirred reactors. First, the mechanism model simulates the real environment and interacts with the PPO agent. Second, gSDE is used to make the exploration of each round more stable and with less variance, ensuring the effect of exploration. Finally, a feedback reward is added to resolve sparse reward issues in the environment, such that the agent can learn how to track and control the CSTR. The algorithm is applied to a double CSTR system to examine its effectiveness. Our simulation results show that the algorithm exhibits a stable training process, small control error, and rapid response to disturbance.

     

/

返回文章
返回