强化学习系统及其基于可靠度最优的学习算法

REINFORCEMENT LEARNING SYSTEM AND ITS LEARNING ALGORITHMS FOR RELIABILITY OPTIMIZATION

摘要: 归纳了强化学习的主要理论方法，提出了一个区分主客观因素的强化学习系统描述，引入了任务域的概念．针对以往强化学习采用的期望最优准则描述任务域能力的不足，考虑了目标水平准则下的首达时间可靠度最优准则模型．分别结合随机逼近理论和时间差分理论，提出了基于概率估计的J-学习和无需建模的增量R-学习．

Abstract: The main theoretical methods of reinforcement learning are summarized in this paper.A description of reinforcement learning systems,which distinguishes subjective and objective factors,is proposed, and the concept of task domain is introduced. In view of the shortcomings of the expectation optimality criteria formerly used in reinforcement learning, this paper considers the reliability optimization method for first arrival time, based on some target level criterion. In combination with the stochastic approximation theory and the temporal difference method respectively, probability estimation based J-learning and model-free incremental R-learning are developed.