Abstract:
A continuous stirring tank reactor (CSTR) is classic chemical equipment widely used in chemical processes. The traditional control methods fail to meet the precision requirements of tracking control because of their strong nonlinearity and associated time delay. Thus, in this study, we propose a tracking control method of proximal policy optimization (PPO) based on generalized state-dependent exploration (gSDE) for continuous stirred reactors. First, the mechanism model simulates the real environment and interacts with the PPO agent. Second, gSDE is used to make the exploration of each round more stable and with less variance, ensuring the effect of exploration. Finally, a feedback reward is added to resolve sparse reward issues in the environment, such that the agent can learn how to track and control the CSTR. The algorithm is applied to a double CSTR system to examine its effectiveness. Our simulation results show that the algorithm exhibits a stable training process, small control error, and rapid response to disturbance.