Recursive Least Squares Policy Iteration Based on Geodesic Gaussian Basis Function

WANG Xue-song; ZHANG Zheng; CHENG Yu-hu; ZHANG Yi-yang

WANG Xue-song, ZHANG Zheng, CHENG Yu-hu, ZHANG Yi-yang. Recursive Least Squares Policy Iteration Based on Geodesic Gaussian Basis Function[J]. INFORMATION AND CONTROL, 2009, 38(4): 406-411.

Citation:

WANG Xue-song, ZHANG Zheng, CHENG Yu-hu, ZHANG Yi-yang. Recursive Least Squares Policy Iteration Based on Geodesic Gaussian Basis Function[J]. INFORMATION AND CONTROL, 2009, 38(4): 406-411.

Citation:

WANG Xue-song, ZHANG Zheng, CHENG Yu-hu, ZHANG Yi-yang. Recursive Least Squares Policy Iteration Based on Geodesic Gaussian Basis Function[J]. INFORMATION AND CONTROL, 2009, 38(4): 406-411.

Recursive Least Squares Policy Iteration Based on Geodesic Gaussian Basis Function

Graphical Abstract

Graphical Abstract

Abstract

Abstract

An appropriate selection of basis function directly in?uences the learning performance of a policy iteration method during the value function approximation.In order to describe the topology relationship of an environment better,a geodesic distance is substituted for a Euclidean distance used in an ordinary Gaussian function and a policy iteration reinforcement learning method based on geodesic Gaussian basis function is proposed.At first,a graph about the environment can be built based on the sample data generated from a Markov decision process(MDP).Secondly,geodesic Gaussian basis functions are defined on the graph.A shortest path obtained by a shortest path faster algorithm is used to approximate a geodesic distance.Then a state-action value function in learning system is assumed as the linearly weighted sum of the given geodesic Gaussian basis functions,and a recursive least squares method is used to update the weights in an on-line and incremental manner.At last,policy improvement is carried out based on the estimated state-action value.Simulation results of 10×10 and 20×20 mazes illustrate the validity of the proposed policy iteration method.

FullText(HTML)

References (16)

Cited By

Recursive Least Squares Policy Iteration Based on Geodesic Gaussian Basis Function

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content