基于自组织模糊RBF网络的连续空间Q学习

A Q-learning Method for Continuous Space Based on Self-organizing Fuzzy RBF Network

  • 摘要: 针对连续空间下的强化学习控制问题,提出了一种基于自组织模糊RBF网络的Q学习方法.网络的输入为状态,输出为连续动作及其Q值,从而实现了“连续状态—连续动作”的映射关系.首先将连续动作空间离散化为确定数目的离散动作,采用完全贪婪策略选取具有最大Q值的离散动作作为每条模糊规则的局部获胜动作.然后采用命令融合机制对获胜的离散动作按其效用值进行加权,得到实际作用于系统的连续动作.另外,为简化网络结构和提高学习速度,采用改进的RAN算法和梯度下降法分别对网络的结构和参数进行在线自适应调整.倒立摆平衡控制的仿真结果验证了所提Q学习方法的有效性.

     

    Abstract: For reinforcement learning control in continuous spaces,a Q-learning method based on a self-organizing fuzzy RBF(radial basis function) network is proposed.Input of the fuzzy RBF network is state,and the outputs are continuous actions and the corresponding Q-values,which realizes the mapping from a continuous state space to a continuous action space.At first,the continuous action space is discretized into the discrete actions with definite number,and a completely greedy policy is used to select a discrete action with the maximum Q-value as the winning local actions of each fuzzy rule.Then a command fusion mechanism is adopted to weight the winning local actions of each fuzzy rule according to its utility value,and a continuous action is generated for the actual system.Moreover,in order to simplify the network structure and improve the learning speed,an improved resource allocating network(RAN) algorithm and a gradient descent algorithm are applied to adjust the structure and parameters of the fuzzy RBF network in an on-line and adaptive manner respectively.The effectiveness of the proposed Q-learning method is shown through simulation on the balancing control of an inverted pendulum system.

     

/

返回文章
返回