DOI QR코드

DOI QR Code

A Distributed Scheduling Algorithm based on Deep Reinforcement Learning for Device-to-Device communication networks

단말간 직접 통신 네트워크를 위한 심층 강화학습 기반 분산적 스케쥴링 알고리즘

  • Jeong, Moo-Woong (Dept. of Information and Communication Engineering, Gyeongsang National University) ;
  • Kim, Lyun Woo (Dept. of Information and Communication Engineering, Gyeongsang National University) ;
  • Ban, Tae-Won (Dept. of Information and Communication Engineering, Gyeongsang National University)
  • Received : 2020.09.24
  • Accepted : 2020.10.05
  • Published : 2020.11.30

Abstract

In this paper, we study a scheduling problem based on reinforcement learning for overlay device-to-device (D2D) communication networks. Even though various technologies for D2D communication networks using Q-learning, which is one of reinforcement learning models, have been studied, Q-learning causes a tremendous complexity as the number of states and actions increases. In order to solve this problem, D2D communication technologies based on Deep Q Network (DQN) have been studied. In this paper, we thus design a DQN model by considering the characteristics of wireless communication systems, and propose a distributed scheduling scheme based on the DQN model that can reduce feedback and signaling overhead. The proposed model trains all parameters in a centralized manner, and transfers the final trained parameters to all mobiles. All mobiles individually determine their actions by using the transferred parameters. We analyze the performance of the proposed scheme by computer simulation and compare it with optimal scheme, opportunistic selection scheme and full transmission scheme.

본 논문에서는 오버레이 단말 간 직접 (Device-to-Device : D2D) 통신 네트워크를 위한 강화학습 기반 스케쥴링 문제를 연구한다. 강화학습 모델 중 하나인 Q-learning을 이용한 D2D 통신 기술들이 연구되었지만, Q-learning은 상태와 행동의 개수가 증가함에 따라서 높은 복잡도를 유발한다. 이러한 문제를 해결하기 위하여 Deep Q Network (DQN) 기반 D2D 통신 기술들이 연구되었다. 본 논문에서는 무선 통신 시스템 특성을 고려한 DQN 모델을 디자인하고, 피드백 및 시그널링 오버헤드를 줄일 수 있는 DQN 기반 분산적 스케쥴링 방식을 제안한다. 제안 방식은 중앙집중식으로 변수들을 학습시키고, 최종 학습된 파라미터를 모든 단말들에게 전달한다. 모든 단말들은 최종 학습된 파라미터를 이용하여 각자의 행동을 개별적으로 결정한다. 제안 방식의 성능을 컴퓨터 시뮬레이션을 통하여 분석하고, 최적방식, 기회주의적 선택 방식, 전체 전송 방식과 비교한다.

Keywords

References

  1. M. Sheng, H. Sun, X. Wang, Y. Zhang, T. Q. S. Quek, J. Liu, and J. Li, "Ondemand scheduling: achieving QoS dierentiation for D2D communications," IEEE Communications Magazine, vol. 53, no. 7, pp. 162-170, Jul. 2015. https://doi.org/10.1109/MCOM.2015.7158280
  2. J. Lyu, Y. H. Chew, and W.-C. Wong, "A Stackelberg Game Model for Overlay D2D Transmission With Heterogeneous Rate Requirements," IEEE Transactions on Vehicular Technology, vol. 65, no. 10, pp. 8461-8475, Oct. 2016. https://doi.org/10.1109/TVT.2015.2511924
  3. J. Xu and C. Guo, "Scheduling Stochastic Real-Time D2D Communications," IEEE Transactions on Vehicular Technology, vol. 68, no. 6, pp. 6022-6036, Jun. 2019. https://doi.org/10.1109/TVT.2019.2910933
  4. Siba Narayan Swain, Rahul Thakur, and C. Siva Ram Murthy, "Design and stochastic geometric analysis of an efficient Q-Learning based physical resource block allocation scheme to maximize the spectral efficiency of Device-to-Device overlaid cellular networks," Computer Networks, vol. 119, pp. 71-85, Mar. 2017. https://doi.org/10.1016/j.comnet.2017.03.014
  5. Z. Fan, X. Gu, S. Nie, and M. Chen, "D2D power control based on supervised and unsupervised learning," 2017 3rd IEEE International Conference on Computer and Communications (ICCC), Chengdu, pp. 558-563, 2017.
  6. X. Fang, T. Zhang, Y. Liu, and Z. Zeng, "Multi-Agent Cooperative Alternating Q-Learning Caching in D2D-Enabled Cellular Networks," 2019 IEEE Global Communications Conference (GLOBECOM), Waikoloa, HI, USA, pp. 1-6, 2019.
  7. J. Yin, L. Li, Y. Xu, W. Liang, H. Zhang, and Z. Han, "Joint Content Popularity Prediction and Content Delivery Policy for Cache- Enabled D2D Networks: A Deep Reinforcement Learning Approach," 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP), Anaheim, CA, USA, pp. 609-613, 2018.
  8. J. Tang, H. Tang, X. Zhang, K. Cumanan, G. Chen, K.-K. Wong, J. A. Chambers, "Energy Minimization in D2D-Assisted Cache-Enabled Internet of Things: A Deep Reinforcement Learning Approach," IEEE Transactions on Industrial Informatics, vol. 16, no. 8, pp. 5412-5423, Aug. 2020. https://doi.org/10.1109/TII.2019.2954127
  9. R. Li, Y. Zhao, C. Wang, X. Wang, V. C. M. Leung, X. Li, T. Taleb, "Edge Caching Replacement Optimization for D2D Wireless Networks via Weighted Distributed DQN," 2020 IEEE Wireless Communications and Networking Conference (WCNC), Seoul, Korea (South), pp. 1-6, 2020.
  10. T. Ban and B. C. Jung, "On the Link Scheduling for Cellular-Aided Device-to-Device Networks," IEEE Transactions on Vehicular Technology, vol. 65, no. 11, pp. 9404-9409, Nov. 2016. doi: 10.1109/TVT.2016.2519461.
  11. F. Meng, P. Chen, L. Wu, and J. Cheng, "Power Allocation in Multi-User Cellular Networks: Deep Reinforcement Learning Approaches," IEEE Transactions on Wireless Communications, 2020. doi: 10.1109/TWC.2020.3001736.