Barycentric Approximator for Reinforcement Learning Control

  • Whang Cho (Department of control and instrumentation, Kwangwoon University)
  • Published : 2002.01.01

Abstract

Recently, various experiments to apply reinforcement learning method to the self-learning intelligent control of continuous dynamic system have been reported in the machine learning related research community. The reports have produced mixed results of some successes and some failures, and show that the success of reinforcement learning method in application to the intelligent control of continuous control systems depends on the ability to combine proper function approximation method with temporal difference methods such as Q-learning and value iteration. One of the difficulties in using function approximation method in connection with temporal difference method is the absence of guarantee for the convergence of the algorithm. This paper provides a proof of convergence of a particular function approximation method based on \"barycentric interpolator\" which is known to be computationally more efficient than multilinear interpolation .

Keywords

References

  1. J. C. Santamana, R. S. Sutton, and A. Ram, 'Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces,' COINS Technical Report 96-088, Dec. 1996 https://doi.org/10.1177/105971239700600201
  2. R. S. Sutton, 'Leaming to predict by the methods of tempora) differences,' Machine Leaming, 3(1):9-44, 1988 https://doi.org/10.1007/BF00115009
  3. C. J. C. H. Watkins, 'Leaming from delayed rewards,' Ph.D thesis, King's college, Cambridge, England, 1989
  4. G. Tesauro, 'Neurogammon: a neural network backgammon program,' IN IJCNN Proceedings III pages33-39, 1990
  5. A. G. Barto, and R. S. Sutton, 'Reinforcement Leaming: An Introduction,' The MIT Press, Cambhdge, Massachusetts, 1998
  6. R. H. Crites, and A. G. Barto, ' Improving elevator performance using reinforcement leaming,' Advances in Neural Information Processing Systems: Proceedings of the 1995 Conference, pp. 1017-1023. MIT Press, Cambridge, MA
  7. J. A. Boyan, and A. W. Moore, 'Generalization in reinforcement leaming: safely approximating the value t'unction,' Advances in Neural Information Processing Systems, volume 7. Morean Kaufmann, 1995
  8. D. W. Moore, 'Simplical Mesh Generation with Applications,' PhD. Thesis. Report no. 92-1322, Comell University, 1992
  9. D. T. Bertsekas, and J. N. Tsitsiklis, 'Parallel and Distributed Computation: Numerical Methods,' Prentice Hall, 1989
  10. T. Jaakkola, M. I. Jordan, and S. P. Singh, 'On the convergence of stochastic iterative dynamic programming algorithms,' Neural computation, 6(6): 1185-1201,1994 https://doi.org/10.1162/neco.1994.6.6.1185
  11. J. N. Tsitsiklis, 'Asynchronous stochastic approximation and Q-learning,' Machine Learning, 16(3): 185-202, 1994 https://doi.org/10.1007/BF00993306
  12. J. A, Boyan, and A. W. Moore, 'Generalization in reinforcement learning: safely approximating the value function,' Advances in Neural Information Processing Systems, volume 7, Morgan Kaufmann, 1995
  13. A. W. Moore, 'Variable resolution dynamic programming: efficiently learning action maps in multivariate real-valued state-spaces,' Machine Learning: Proceedings of the eighth international workshop, Morgan Kaufmann, 1991