DOI QR코드

DOI QR Code

Controller Learning Method of Self-driving Bicycle Using State-of-the-art Deep Reinforcement Learning Algorithms

  • Received : 2018.09.17
  • Accepted : 2018.10.01
  • Published : 2018.10.31

Abstract

Recently, there have been many studies on machine learning. Among them, studies on reinforcement learning are actively worked. In this study, we propose a controller to control bicycle using DDPG (Deep Deterministic Policy Gradient) algorithm which is the latest deep reinforcement learning method. In this paper, we redefine the compensation function of bicycle dynamics and neural network to learn agents. When using the proposed method for data learning and control, it is possible to perform the function of not allowing the bicycle to fall over and reach the further given destination unlike the existing method. For the performance evaluation, we have experimented that the proposed algorithm works in various environments such as fixed speed, random, target point, and not determined. Finally, as a result, it is confirmed that the proposed algorithm shows better performance than the conventional neural network algorithms NAF and PPO.

Keywords

References

  1. L. Keo and M. Yamakita, "Controlling balancer and steering for bicycle stabilization," 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4541-4546, Oct. 2009.
  2. J. P. Meijaard, J. M. Papadopoulos, A. Ruina, and A. L. Schwab, "Linearized dynamics equations for the balance and steer of a bicycle: a benchmark and review," In Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, Vol. 463, No. 2084, pp. 1955-1982. The Royal Society, Aug. 2007. https://doi.org/10.1098/rspa.2007.1857
  3. A. Schwab, J. Meijaard, and J. Kooijman, "Some recent developments in bicycle dynamics," In Proceedings of the 12th World Congress in Mechanism and Machine Science, pp. 1-6, 2007.
  4. J. Tan, Y. Gu, C. K. Liu, and G. Turk, “Learning bicycle stunts,” ACM Transactions on Graphics (TOG), Vol. 33, No. 4, pp. 1-16, 2014.
  5. Google Nederland, "Introducing the self-driving bicycle in the netherlands," March, 2017.
  6. J. Randlv and P. Alstrm, "Learning to drive a bicycle using reinforcement learning and shaping," Proceeding ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning, pp. 463-471, 1998.
  7. L. P. Tuyen and T. Chung, "Controlling bicycle using deep deterministic policy gradient algorithm," In Ubiquitous Robots and Ambient Intelligence (URAI), 2017 14th International Conference on, pp. 413-417. IEEE, 2017.
  8. J. Peters and S. Schaal, “Reinforcement learning of motor skills with policy gradients,” Neural networks, Vol. 21, No. 4, pp. 682-697, May 2008. https://doi.org/10.1016/j.neunet.2008.02.003
  9. T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, "Continuous control with deep reinforcement learning," arXiv preprint arXiv:1509.02971, 2015.
  10. R. S. Sutton and A. G. Barto, "Reinforcement learning: An introduction," Vol. 1, MIT press Cambridge, 1998.
  11. D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller, "Deterministic policy gradient algorithms," In ICML, June 2014.
  12. V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al, "Human-level control through deep reinforcement learning," Nature, Vol. 518, pp. 529-533, Feb. 2015. https://doi.org/10.1038/nature14236
  13. C.-L. Hwang, H.-M.Wu, and C.-L. Shih, “Fuzzy sliding-mode underactuated control for autonomous dynamic balance of an electrical bicycle,” IEEE transactions on control systems technology, Vol. 17, No. 3, pp. 658-670, May 2009. https://doi.org/10.1109/TCST.2008.2004349
  14. G. E. Uhlenbeck and L. S. Ornstein, “On the theory of the brownian motion,” Physical review, Vol. 36, No. 5, pp. 823-841, Sep. 1930. https://doi.org/10.1103/PhysRev.36.823
  15. D. P. Kingma and J. Ba. Adam, "A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014.
  16. S. Gu, T. Lillicrap, I. Sutskever, and S. Levine, "Continuous deep q-learning with model-based acceleration," In International Conference on Machine Learning, pp. 2829-2838, June 2016.
  17. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal policy optimization algorithms," arXiv preprint arXiv:1707.06347, 2017.
  18. J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, "Trust region policy optimization," In International Conference on Machine Learning, pp. 1889-1897, 2015.
  19. M. Lu and X. Li, "Deep reinforcement learning policy in Hex game system," 2018 Chinese Control And Decision Conference (CCDC), pp. 6623-6626, 2018.
  20. E. Bejar and A. Moran, "Deep reinforcement learning based neuro-control for a two-dimensional magnetic positioning system," 2018 4th International Conference on Control, Automation and Robotics (ICCAR), pp. 268-273, 2018.
  21. T. Yasuda and K. Ohkura, "Collective Behavior Acquisition of Real Robotic Swarms Using Deep Reinforcement Learning," 2018 Second IEEE International Conference on Robotic Computing (IRC), pp. 179-180, 2018.