马军伟, 徐琛, 陶洪峰, 杨慧中. 基于双行动者深度确定性策略梯度算法的间歇过程控制[J]. 信息与控制, 2023, 52(6): 773-783, 810. DOI: 10.13976/j.cnki.xk.2023.2488
引用本文: 马军伟, 徐琛, 陶洪峰, 杨慧中. 基于双行动者深度确定性策略梯度算法的间歇过程控制[J]. 信息与控制, 2023, 52(6): 773-783, 810. DOI: 10.13976/j.cnki.xk.2023.2488
MA Junwei, XU Chen, TAO Hongfeng, YANG Huizhong. Batch Process Control Based on Twin-actor Deep Deterministic Policy Gradient Algorithm[J]. INFORMATION AND CONTROL, 2023, 52(6): 773-783, 810. DOI: 10.13976/j.cnki.xk.2023.2488
Citation: MA Junwei, XU Chen, TAO Hongfeng, YANG Huizhong. Batch Process Control Based on Twin-actor Deep Deterministic Policy Gradient Algorithm[J]. INFORMATION AND CONTROL, 2023, 52(6): 773-783, 810. DOI: 10.13976/j.cnki.xk.2023.2488

基于双行动者深度确定性策略梯度算法的间歇过程控制

Batch Process Control Based on Twin-actor Deep Deterministic Policy Gradient Algorithm

  • 摘要: 针对传统基于模型的控制方法在处理间歇过程任务时会因为其复杂的非线性动态导致模型不准确,进而影响控制性能的问题,结合强化学习(RL),提出一种不需要过程模型的间歇过程控制方案。首先,该方法通过双行动者并行训练的结构来解决深度强化学习算法中值函数高估计的问题,提高算法的学习效率。其次,为每个行动者设置独立的经验池来保持双行动者的独立性。此外,为RL控制器设置了一种新型奖励函数,引导过程回到预定轨迹,并通过引入延迟策略更新方法来缓解参数更新时的时序差分(TD)误差累积问题。最后利用青霉素发酵过程的仿真,展示了基于双行动者深度确定性策略梯度(TA-DDPG)算法的控制器对间歇过程控制的有效性。

     

    Abstract: We propose a batch process control scheme without a process model by combining reinforcement learning (RL) to solve the problem that conventional model-based control methods have inaccurate models because of their complex nonlinear dynamics when dealing with batch process tasks, which affects control performance. First, the method solves the problem of high estimation of the value function in deep RL algorithms by the structure of twin-actor parallel training to improve the learning efficiency of the algorithm. Second, an independent experience pool is established for each actor to maintain the independence of the twin actors. Furthermore, a novel reward function is established for the RL controller to guide the process back to the predetermined trajectory; we mitigate the temporal difference (TD) error accumulation problem in parameter updating by introducing a delayed policy update method. Finally, the effectiveness of the controller based on the twin-actor deep deterministic policy gradient algorithm for batch process control is demonstrated by simulating the penicillin fermentation process.

     

/

返回文章
返回