Autonomous control of bicycle using Deep Deterministic Policy Gradient Algorithm

Deep Deterministic Policy Gradient 알고리즘을 응용한 자전거의 자율 주행 제어

  • Received : 2018.03.09
  • Accepted : 2018.09.13
  • Published : 2018.09.30

Abstract

The Deep Deterministic Policy Gradient (DDPG) algorithm is an algorithm that learns by using artificial neural network s and reinforcement learning. Among the studies related to reinforcement learning, which has been recently studied, the D DPG algorithm has an advantage of preventing the cases where the wrong actions are accumulated and affecting the learn ing because it is learned by the off-policy. In this study, we experimented to control the bicycle autonomously by applyin g the DDPG algorithm. Simulation was carried out by setting various environments and it was shown that the method us ed in the experiment works stably on the simulation.

DDPG(Deep Deterministic Policy Gradient)알고리즘은 인공신경망과 강화학습을 사용하여 학습하는 알고리즘이다. 최근 많은 연구가 이루어지고 있는 강화학습과 관련된 연구 중에서도 DDPG 알고리즘은 오프폴리시로 학습하기 때문에 잘못된 행동이 누적되어 학습에 영향을 미치는 경우를 방지하는 장점이 있다. 본 연구에서는 DDPG 알고리즘을 응용하여 자전거를 자율주행 하도록 제어하는 실험을 진행하였다. 다양한 환경을 설정하여 시뮬레이션을 진행하였고 실험을 통해서 사용된 방법이 시뮬레이션 상에서 안정적으로 동작함을 보였다.

Keywords

References

  1. Herlihy, David V. Bicycle: the history. Yale University Press, 2004.
  2. Schwab, A. L., J. P. Meijaard, and J. D.G. Kooijman. "Some recent developments in bicycle dynamics." Proceedings of the 12th World Congress in Mechanism and Machine Science. 2007.
  3. Meijaard, Jaap P., et al. "Linearized dynamics equations for the balance and steer of a bicycle: a benchmark and review." Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences. Vol. 463. No. 2084. The Royal Society, 2007.
  4. http://ai2001.ifdef.jp/primer_V2/primer_V2.html
  5. Basso, Michele, and Giacomo Innocenti. "Lego-bike: A challenging robotic lab project to illustrate rapid prototyping in the mindstorms/simulink integrated platform." Computer Applications in Engineering Education 23.6 (2015): 947-958. https://doi.org/10.1002/cae.21666
  6. Basso, Michele, Giacomo Innocenti, and Alberto Rosa. "Simulink meets lego: Rapid controller prototyping of a stabilized bicycle model." 52nd IEEE Conference on Decisionand Control. IEEE, 2013.
  7. Randlov, Jette, and Preben Alstrom. "Learning to Drive a Bicycle Using Reinforcement Learning and Shaping." ICML. Vol. 98. 1998.
  8. Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An introduction. Vol.1, No.1, Cambridge: MIT press, 1998.
  9. Lagoudakis, Michail G., and Ronald Parr. "Model-free least-squares policy iteration." NIPS, Vol.14, 2001.
  10. Lever, Guy. "Deterministic policy gradient algorithms.", 2014.
  11. Phyo Htet Kyaw, Dyna-Q based Univector Field Obstacle Avoidance for Fast Mobile Robots, Master, KyungHee University, Korea, Seoul, 2011.
  12. Kaelbling, Leslie Pack, Michael L. Littman, and Andrew W. Moore. "Reinforcement learning: A survey." Journal of artificial intelligence research 4 (1996): 237-285. https://doi.org/10.1613/jair.301
  13. Irodova, Marina, and Robert H. Sloan. "Reinforcement Learning and Function Approximation." FLAIRS Conference. 2005.
  14. Watkins, Christopher JCH, and Peter Dayan. "Q-learning." Machine learning 8.3-4 (1992): 279-292. https://doi.org/10.1007/BF00992698
  15. G.A. Rummery and M. Niranjan, On-Line Q-Learning Using Connectionist Systems. Technical Report CUED/F-INFENG/TR 166, Cambridge University Engineering Department, 1994.
  16. Grondman, Ivo, et al. "A survey of actor-critic reinforcement learning: Standard and natural policy gradients." IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42.6 (2012): 1291-1307. https://doi.org/10.1109/TSMCC.2012.2218595
  17. Sutton, Richard S., et al. "Policy Gradient Methods for Reinforcement Learning with Function Approximation." NIPS. Vol. 99. 1999.
  18. Peters, Jan, and Stefan Schaal. "Policy gradient methods for robotics." 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2006.
  19. Lillicrap, Timothy P., et al. "Continuous control with deep reinforcement learning." arXiv preprint arXiv:1509.02971 (2015).
  20. Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." Nature 518.7540 (2015): 529-533. https://doi.org/10.1038/nature14236
  21. Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.
  22. Kim Tae Hee, Kang Seung Ho, "An Intrusion Detection System based on the Artificial Neural Network for Real Time Detection." Journal of Information and Security. 2018.