DOI QR코드

DOI QR Code

Actor-Critic Algorithm with Transition Cost Estimation

  • Sergey, Denisov (Department of Electrical and Computer Engineering, Sungkyunkwan University) ;
  • Lee, Jee-Hyong (Department of Electrical and Computer Engineering, Sungkyunkwan University)
  • Received : 2016.11.28
  • Accepted : 2016.12.13
  • Published : 2016.12.12

Abstract

We present an approach for acceleration actor-critic algorithm for reinforcement learning with continuous action space. Actor-critic algorithm has already proved its robustness to the infinitely large action spaces in various high dimensional environments. Despite that success, the main problem of the actor-critic algorithm remains the same-speed of convergence to the optimal policy. In high dimensional state and action space, a searching for the correct action in each state takes enormously long time. Therefore, in this paper we suggest a search accelerating function that allows to leverage speed of algorithm convergence and reach optimal policy faster. In our method, we assume that actions may have their own distribution of preference, that independent on the state. Since in the beginning of learning agent act randomly in the environment, it would be more efficient if actions were taken according to the some heuristic function. We demonstrate that heuristically-accelerated actor-critic algorithm learns optimal policy faster, using Educational Process Mining dataset with records of students' course learning process and their grades.

Keywords

References

  1. C. J. C. H. Watkins and P. Dayan, "Q-learning," Machine Learning, vol. 8, no. 3, pp. 279-292, 1992. http://dx.doi.org/10.1023/A:1022676722315
  2. D. Ernst, P. Geurts, and L. Wehenkel, "Tree-based batch mode reinforcement learning," Journal of Machine Learning Research, vol. 6, pp. 503-556, 2005.
  3. V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, "Playing Atari with deep reinforcement learning," Available https://arxiv.org/abs/1312.5602
  4. Y. Tkachenko, "Autonomous CRM control via CLV approximation with deep reinforcement learning in discrete and continuous action space," Available https://arxiv.org/abs/1504.01840
  5. M. Riedmiller, "Neural fitted Q iteration: first experiences with a data efficient neural reinforcement learning method," in Machine learning: ECML 2005, J. Gama, R. Camacho, P. B. Brazdil, A. M. Jorge, and L. Torgo, Eds. Berlin: Springer Berlin Heidelberg, 2005, pp. 317-328. http://dx.doi.org/10.1007/1156409632
  6. T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, "Continuous control with deep reinforcement learning," Available https://arxiv.org/abs/1509.02971
  7. R. Sutton, "Generalization in reinforcement learning: successful examples using sparse coarse coding," Advances in Neural Information Processing Systems, vol. 8, pp. 1038-1044, 1996.
  8. H. van Hasselt, A. Guez, and D. Silver, "Deep reinforcement learning with double Q-learning," Available https://arxiv.org/abs/1509.06461
  9. V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, et al, "Human-level control through deep reinforcement learning," Nature, vol. 518, no. 7540, pp. 529-533, 2015. http://dx.doi.org/10.1038/nature14236
  10. D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller, "Deterministic policy gradient algorithms," in Proceedings of the 31st International Conference on Machine Learning (ICML-14), Beijing, China, 2014, pp. 387-395.
  11. R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation," Advances in Neural Information Processing Systems 12, vol. 99, pp. 1057-1063, 2000.
  12. L. A. Celiberto, C. H. C. Ribeiro, A. H. R. Costa, and R. A. C. Bianchi, "Heuristic reinforcement learning applied to robocup simulation agents," in RoboCup 2007: Robot Soccer World Cup XI, U. Visser, F. Ribeiro, T. Ohashi, and F. Dellaert, Eds. Berlin: Springer Berlin Heidelberg, 2008, pp 220-227. http://dx.doi.org/10.1007/978-3-540-68847-119
  13. R. A. C. Bianchi, M. F. Martins, C. H. C. Ribeiro, and A. H. R. Costa, "Heuristically-accelerated multiagent reinforcement learning," IEEE Transactions on Cybernetics, vol. 44, no. 2, pp. 252-265, 2014. http://dx.doi.org/10.1109/TCYB.2013.2253094