Abstract:
The main theoretical methods of reinforcement learning are summarized in this paper.A description of reinforcement learning systems,which distinguishes subjective and objective factors,is proposed, and the concept of task domain is introduced. In view of the shortcomings of the expectation optimality criteria formerly used in reinforcement learning, this paper considers the reliability optimization method for first arrival time, based on some target level criterion. In combination with the stochastic approximation theory and the temporal difference method respectively, probability estimation based J-learning and model-free incremental R-learning are developed.