Average Reward Reinforcement Learning Scheduling of Closed Reentrant Production Systems
-
Graphical Abstract
-
Abstract
How to schedule the closed reentrant queueing networks so as to maximize the system mean output is an intractable NP-hard problem. In this paper, a method of average reward reinforcement learning (RL) is applied to automatically find an adaptive scheduling policy by directly optimizing the mean output. Numerical study demonstrates that the RL scheduler consistently outperforms all the known priority policies.
-
-