Abstract:
For reinforcement learning control in continuous spaces,a Q-learning method based on a self-organizing fuzzy RBF(radial basis function) network is proposed.Input of the fuzzy RBF network is state,and the outputs are continuous actions and the corresponding Q-values,which realizes the mapping from a continuous state space to a continuous action space.At first,the continuous action space is discretized into the discrete actions with definite number,and a completely greedy policy is used to select a discrete action with the maximum Q-value as the winning local actions of each fuzzy rule.Then a command fusion mechanism is adopted to weight the winning local actions of each fuzzy rule according to its utility value,and a continuous action is generated for the actual system.Moreover,in order to simplify the network structure and improve the learning speed,an improved resource allocating network(RAN) algorithm and a gradient descent algorithm are applied to adjust the structure and parameters of the fuzzy RBF network in an on-line and adaptive manner respectively.The effectiveness of the proposed Q-learning method is shown through simulation on the balancing control of an inverted pendulum system.