Barycentric Approximator for Reinforcement Learning Control

Whang Cho;

International Journal of Precision Engineering and Manufacturing

Volume 3 Issue 1
/
Pages.33-42
/
2002
/
2234-7593(pISSN)
/
2005-4602(eISSN)

Korean Society for Precision Engineering (한국정밀공학회)

Barycentric Approximator for Reinforcement Learning Control

Whang Cho (Department of control and instrumentation, Kwangwoon University)

Published : 2002.01.01

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Recently, various experiments to apply reinforcement learning method to the self-learning intelligent control of continuous dynamic system have been reported in the machine learning related research community. The reports have produced mixed results of some successes and some failures, and show that the success of reinforcement learning method in application to the intelligent control of continuous control systems depends on the ability to combine proper function approximation method with temporal difference methods such as Q-learning and value iteration. One of the difficulties in using function approximation method in connection with temporal difference method is the absence of guarantee for the convergence of the algorithm. This paper provides a proof of convergence of a particular function approximation method based on \"barycentric interpolator\" which is known to be computationally more efficient than multilinear interpolation .

Keywords

References

J. C. Santamana, R. S. Sutton, and A. Ram, 'Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces,' COINS Technical Report 96-088, Dec. 1996 https://doi.org/10.1177/105971239700600201
R. S. Sutton, 'Leaming to predict by the methods of tempora) differences,' Machine Leaming, 3(1):9-44, 1988 https://doi.org/10.1007/BF00115009
C. J. C. H. Watkins, 'Leaming from delayed rewards,' Ph.D thesis, King's college, Cambridge, England, 1989
G. Tesauro, 'Neurogammon: a neural network backgammon program,' IN IJCNN Proceedings III pages33-39, 1990
A. G. Barto, and R. S. Sutton, 'Reinforcement Leaming: An Introduction,' The MIT Press, Cambhdge, Massachusetts, 1998
R. H. Crites, and A. G. Barto, ' Improving elevator performance using reinforcement leaming,' Advances in Neural Information Processing Systems: Proceedings of the 1995 Conference, pp. 1017-1023. MIT Press, Cambridge, MA
J. A. Boyan, and A. W. Moore, 'Generalization in reinforcement leaming: safely approximating the value t'unction,' Advances in Neural Information Processing Systems, volume 7. Morean Kaufmann, 1995
D. W. Moore, 'Simplical Mesh Generation with Applications,' PhD. Thesis. Report no. 92-1322, Comell University, 1992
D. T. Bertsekas, and J. N. Tsitsiklis, 'Parallel and Distributed Computation: Numerical Methods,' Prentice Hall, 1989
T. Jaakkola, M. I. Jordan, and S. P. Singh, 'On the convergence of stochastic iterative dynamic programming algorithms,' Neural computation, 6(6): 1185-1201,1994 https://doi.org/10.1162/neco.1994.6.6.1185
J. N. Tsitsiklis, 'Asynchronous stochastic approximation and Q-learning,' Machine Learning, 16(3): 185-202, 1994 https://doi.org/10.1007/BF00993306
J. A, Boyan, and A. W. Moore, 'Generalization in reinforcement learning: safely approximating the value function,' Advances in Neural Information Processing Systems, volume 7, Morgan Kaufmann, 1995
A. W. Moore, 'Variable resolution dynamic programming: efficiently learning action maps in multivariate real-valued state-spaces,' Machine Learning: Proceedings of the eighth international workshop, Morgan Kaufmann, 1991

International Journal of Precision Engineering and Manufacturing

Barycentric Approximator for Reinforcement Learning Control

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)