可重入生产系统的平均报酬型强化学习调度

Average Reward Reinforcement Learning Scheduling of Closed Reentrant Production Systems

摘要: 在可重入生产系统中，一个重要的问题就是对调度策略进行优化，以提高系统平均输出率．本文采用了一种平均报酬型强化学习算法来解决该问题，直接从所关心的系统品质出发，自动获得具有自适应性的动态调度策略．仿真结果表明，其性能优于两种熟知的优先权调度策略．

Abstract: How to schedule the closed reentrant queueing networks so as to maximize the system mean output is an intractable NP-hard problem. In this paper, a method of average reward reinforcement learning (RL) is applied to automatically find an adaptive scheduling policy by directly optimizing the mean output. Numerical study demonstrates that the RL scheduler consistently outperforms all the known priority policies.