DOI QR코드

DOI QR Code

A Study of Adaptive QoS Routing scheme using Policy-gradient Reinforcement Learning

정책 기울기 값 강화학습을 이용한 적응적인 QoS 라우팅 기법 연구

  • 한정수 (신구대학 컴퓨터멀티미디어과)
  • Received : 2010.10.15
  • Accepted : 2010.11.17
  • Published : 2011.02.28

Abstract

In this paper, we propose a policy-gradient routing scheme under Reinforcement Learning that can be used adaptive QoS routing. A policy-gradient RL routing can provide fast learning of network environments as using optimal policy adapted average estimate rewards gradient values. This technique shows that fast of learning network environments results in high success rate of routing. For prove it, we simulate and compare with three different schemes.

본 논문에서는 강화학습(RL : Reinforcement Learning) 환경 하에서 정책 기울기 값 기법을 사용하는 적응적인 QoS 라우팅 기법을 제안하였다. 이 기법은 기존의 강화학습 환경 하에 제공하는 기법에 비해 기대 보상값의 기울기 값을 정책에 반영함으로써 빠른 네트워크 환경을 학습함으로써 보다 우수한 라우팅 성공률을 제공할 수 있는 기법이다. 이를 검증하기 위해 기존의 기법들과 비교 검증함으로써 그 우수성을 확인하였다.

Keywords

References

  1. Srihari Nelakuditi, Zhi-Li Zhang and Rose P.Tsang, "Adaptive Proportional Routing: A Localized QoS Routing Approach," In IEEE Infocom, April 2000.
  2. Y.Liu, C.K. Tham and TCK. Hui, "MAPS: A Localized and Distributed Adaptive Path Selection in MPLS Networks," in Proceedings of 2003 IEEE Workshop on High Performance Switching and Routing, Torino, Italy, pp. 24-28, June 2003.
  3. Yvn Tpac Valdivia, Marley M, Vellasco, Marco A. Pacheco "An Adaptive Network Routing Strategy with Temporal Differences," Inteligencia Artificial, Revista Lberoamericana de Inteligencia Aritificial, No. 12, pp. 85-91, 2001.
  4. Jeongsoo Han, "Network-Adaptive QoS Routing Using Local Information," APNOMS 2006, LNCS 4238, pp. 190-199, 2006.
  5. Leslie Pack Kaelbling, Michael L. Littman, Andrew W.Moore, "Reinforcement Learning:A Survey," Journal of Artificial Intelligence Research 4, pp. 237-285, 1996
  6. Richard S. Sutton etc, "Policy Gradient Methods for Reinforcement Learning with Function Approximation," Advances in Neural Information Processing System, pp. 1057-1063, MIT Press 2000.
  7. Gregory Z. Grudic, Vijay Kumar, "Using Policy Gradient Reinforcement Learning on Automous Robot Controllers," IROS03, Las Vagas, US, October, 2003.
  8. S.Banerjee, R.K. Ghosh and A.P.K Reddy, "Parallel algorithm for shortest pairs of edge-disjoint paths," Journal Parallel Distrib. Comput. pp. 165-171, 1996.