WANG Xue-song, ZHANG Zheng, CHENG Yu-hu, ZHANG Yi-yang. Recursive Least Squares Policy Iteration Based on Geodesic Gaussian Basis Function[J]. INFORMATION AND CONTROL, 2009, 38(4): 406-411.
Citation: WANG Xue-song, ZHANG Zheng, CHENG Yu-hu, ZHANG Yi-yang. Recursive Least Squares Policy Iteration Based on Geodesic Gaussian Basis Function[J]. INFORMATION AND CONTROL, 2009, 38(4): 406-411.

Recursive Least Squares Policy Iteration Based on Geodesic Gaussian Basis Function

  • An appropriate selection of basis function directly in?uences the learning performance of a policy iteration method during the value function approximation.In order to describe the topology relationship of an environment better,a geodesic distance is substituted for a Euclidean distance used in an ordinary Gaussian function and a policy iteration reinforcement learning method based on geodesic Gaussian basis function is proposed.At first,a graph about the environment can be built based on the sample data generated from a Markov decision process(MDP).Secondly,geodesic Gaussian basis functions are defined on the graph.A shortest path obtained by a shortest path faster algorithm is used to approximate a geodesic distance.Then a state-action value function in learning system is assumed as the linearly weighted sum of the given geodesic Gaussian basis functions,and a recursive least squares method is used to update the weights in an on-line and incremental manner.At last,policy improvement is carried out based on the estimated state-action value.Simulation results of 10×10 and 20×20 mazes illustrate the validity of the proposed policy iteration method.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return