Actor-Critic Algorithm with Transition Cost Estimation

Sergey, Denisov;Lee, Jee-Hyong;

doi:10.5391/IJFIS.2016.16.4.270

International Journal of Fuzzy Logic and Intelligent Systems

Volume 16 Issue 4
/
Pages.270-275
/
2016
/
1598-2645(pISSN)
/
2093-744X(eISSN)

Korean Institute of Intelligent Systems (한국지능시스템학회)

DOI QR Code

Actor-Critic Algorithm with Transition Cost Estimation

Sergey, Denisov (Department of Electrical and Computer Engineering, Sungkyunkwan University) ;
Lee, Jee-Hyong (Department of Electrical and Computer Engineering, Sungkyunkwan University)

Received : 2016.11.28
Accepted : 2016.12.13
Published : 2016.12.12

https://doi.org/10.5391/IJFIS.2016.16.4.270 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

We present an approach for acceleration actor-critic algorithm for reinforcement learning with continuous action space. Actor-critic algorithm has already proved its robustness to the infinitely large action spaces in various high dimensional environments. Despite that success, the main problem of the actor-critic algorithm remains the same-speed of convergence to the optimal policy. In high dimensional state and action space, a searching for the correct action in each state takes enormously long time. Therefore, in this paper we suggest a search accelerating function that allows to leverage speed of algorithm convergence and reach optimal policy faster. In our method, we assume that actions may have their own distribution of preference, that independent on the state. Since in the beginning of learning agent act randomly in the environment, it would be more efficient if actions were taken according to the some heuristic function. We demonstrate that heuristically-accelerated actor-critic algorithm learns optimal policy faster, using Educational Process Mining dataset with records of students' course learning process and their grades.

Keywords

References

C. J. C. H. Watkins and P. Dayan, "Q-learning," Machine Learning, vol. 8, no. 3, pp. 279-292, 1992. http://dx.doi.org/10.1023/A:1022676722315
D. Ernst, P. Geurts, and L. Wehenkel, "Tree-based batch mode reinforcement learning," Journal of Machine Learning Research, vol. 6, pp. 503-556, 2005.
V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, "Playing Atari with deep reinforcement learning," Available https://arxiv.org/abs/1312.5602
Y. Tkachenko, "Autonomous CRM control via CLV approximation with deep reinforcement learning in discrete and continuous action space," Available https://arxiv.org/abs/1504.01840
M. Riedmiller, "Neural fitted Q iteration: first experiences with a data efficient neural reinforcement learning method," in Machine learning: ECML 2005, J. Gama, R. Camacho, P. B. Brazdil, A. M. Jorge, and L. Torgo, Eds. Berlin: Springer Berlin Heidelberg, 2005, pp. 317-328. http://dx.doi.org/10.1007/1156409632
T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, "Continuous control with deep reinforcement learning," Available https://arxiv.org/abs/1509.02971
R. Sutton, "Generalization in reinforcement learning: successful examples using sparse coarse coding," Advances in Neural Information Processing Systems, vol. 8, pp. 1038-1044, 1996.
H. van Hasselt, A. Guez, and D. Silver, "Deep reinforcement learning with double Q-learning," Available https://arxiv.org/abs/1509.06461
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, et al, "Human-level control through deep reinforcement learning," Nature, vol. 518, no. 7540, pp. 529-533, 2015. http://dx.doi.org/10.1038/nature14236
D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller, "Deterministic policy gradient algorithms," in Proceedings of the 31st International Conference on Machine Learning (ICML-14), Beijing, China, 2014, pp. 387-395.
R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation," Advances in Neural Information Processing Systems 12, vol. 99, pp. 1057-1063, 2000.
L. A. Celiberto, C. H. C. Ribeiro, A. H. R. Costa, and R. A. C. Bianchi, "Heuristic reinforcement learning applied to robocup simulation agents," in RoboCup 2007: Robot Soccer World Cup XI, U. Visser, F. Ribeiro, T. Ohashi, and F. Dellaert, Eds. Berlin: Springer Berlin Heidelberg, 2008, pp 220-227. http://dx.doi.org/10.1007/978-3-540-68847-119
R. A. C. Bianchi, M. F. Martins, C. H. C. Ribeiro, and A. H. R. Costa, "Heuristically-accelerated multiagent reinforcement learning," IEEE Transactions on Cybernetics, vol. 44, no. 2, pp. 252-265, 2014. http://dx.doi.org/10.1109/TCYB.2013.2253094

International Journal of Fuzzy Logic and Intelligent Systems

Actor-Critic Algorithm with Transition Cost Estimation

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)