DOI QR코드

DOI QR Code

Solving Survival Gridworld Problem Using Hybrid Policy Modified Q-Based Reinforcement

  • Montero, Vince Jebryl (Dept. of Electronics and Communications Engineering, Kwangwoon University) ;
  • Jung, Woo-Young (Dept. of Electronics and Communications Engineering, Kwangwoon University) ;
  • Jeong, Yong-Jin (Dept. of Electronics and Communications Engineering, Kwangwoon University)
  • Received : 2019.10.30
  • Accepted : 2019.12.11
  • Published : 2019.12.31

Abstract

This paper explores a model-free value-based approach for solving survival gridworld problem. Survival gridworld problem opens up a challenge involving taking risks to gain better rewards. Classic value-based approach in model-free reinforcement learning assumes minimal risk decisions. The proposed method involves a hybrid on-policy and off-policy updates to experience roll-outs using a modified Q-based update equation that introduces a parametric linear rectifier and motivational discount. The significance of this approach is it allows model-free training of agents that take into account risk factors and motivated exploration to gain better path decisions. Experimentations suggest that the proposed method achieved better exploration and path selection resulting to higher episode scores than classic off-policy and on-policy Q-based updates.

Keywords

References

  1. François-Lavet, Vincent et al. "An Introduction to Deep Reinforcement Learning," Foundations and Trends in Machine Learning, Vol.11, No.3-4, 2018. DOI: 10.1561/2200000071
  2. Hessel, Matteo et al. "Rainbow: Combining Improvements in Deep Reinforcement Learning", The 32nd AAAI Conference on Artificial Intelligence, pp.3215-3222, 2018.
  3. Mnih, Volodymyr et al. "Playing Atari with Deep Reinforcement Learning," NIPS Deep Learning Workshop 2013. 2013.
  4. Leike, Jan et al, "AI Safety Gridworlds," arXiv preprint arXiv:1711.09883v2, (2017).
  5. Sutton, R. S. and Barto, A. G. "Reinforcement Learning: An introduction 2nd Edition. Cambridge," Massachussetts:The MIT Press, 2018.
  6. Mnih, Volodymyr et al. "Asynchronous Methods for Deep Reinforcement Learning," arXiv preprint arXiv: 1602.01783v2, 2016.