DOI QR코드

DOI QR Code

The Design and Practice of Disaster Response RL Environment Using Dimension Reduction Method for Training Performance Enhancement

학습 성능 향상을 위한 차원 축소 기법 기반 재난 시뮬레이션 강화학습 환경 구성 및 활용

  • Received : 2020.12.17
  • Accepted : 2021.04.24
  • Published : 2021.07.31

Abstract

Reinforcement learning(RL) is the method to find an optimal policy through training. and it is one of popular methods for solving lifesaving and disaster response problems effectively. However, the conventional reinforcement learning method for disaster response utilizes either simple environment such as. grid and graph or a self-developed environment that are hard to verify the practical effectiveness. In this paper, we propose the design of a disaster response RL environment which utilizes the detailed property information of the disaster simulation in order to utilize the reinforcement learning method in the real world. For the RL environment, we design and build the reinforcement learning communication as well as the interface between the RL agent and the disaster simulation. Also, we apply the dimension reduction method for converting non-image feature vectors into image format which is effectively utilized with convolution layer to utilize the high-dimensional and detailed property of the disaster simulation. To verify the effectiveness of our proposed method, we conducted empirical evaluations and it shows that our proposed method outperformed conventional methods in the building fire damage.

강화학습은 학습을 통해 최적의 행동정책을 탐색하는 기법으로써, 재난 상황에서 효과적인 인명 구조 및 재난 대응 문제 해결을 위해 많이 활용되고 있다. 그러나, 기존 재난 대응을 위한 강화학습 기법은 상대적으로 단순한 그리드, 그래프와 같은 환경 혹은 자체 개발한 강화학습 환경을 통해 평가를 수행함에 따라 그 실용성이 충분히 검증되지 않았다. 본 논문에서는 강화학습 기법을 실세계 환경에서 사용하기 위해 기존 개발된 재난 시뮬레이션 환경의 복잡한 프로퍼티를 활용하는 강화학습 환경 구성과 활용 결과를 제시하고자 한다. 본 제안 강화학습 환경의 구성을 위하여 재난 시뮬레이션과 강화학습 에이전트 간 강화학습 커뮤니케이션 채널 및 인터페이스를 구축하였으며, 시뮬레이션 환경이 제공하는 고차원의 프로퍼티 정보의 활용을 위해 비-이미지 피쳐 벡터(non-image feature vector)에 이미지 변환방식을 적용하였다. 실험을 통해 본 제안 방식이 건물 화재 피해도를 기준으로 한 평가에서 기존 방식 대비 가장 낮은 건물 화재 피해를 기록한 것을 확인하였다.

Keywords

Acknowledgement

본 연구는 과학기술정보통신부 및 정보통신기획평가원의 대학ICT연구센터 지원사업의 연구결과로 수행되었음(IITP-2020-2018-0-01431).

References

  1. V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, "Playing atari with deep reinforcement learning," arXiv preprint arXiv:1312.5602, 2013.
  2. B. R. Kiran, I, Sobh, V. Talpaert., P. Mannion, A. A. A. Sallab, S. Yogamani, and P. Perez, "Deep reinforcement learning for autonomous driving: A survey," arXiv preprint arXiv:2002.00444, 2020.
  3. J. Boyan and M. Littman, "Packet routing in dynamically changing networks: A reinforcement learning approach," Advances in Neural Information Processing Systems, pp.671-678, 1994.
  4. L. Nguyen, Z. Yang, J. Zhu, J. Li, and F. Jin, "Coordinating disaster emergency response with heuristic reinforcement learning," arXiv preprint arXiv:1811.05010, 2018
  5. J. Sharma, P. A. Andersen, O. C. Granmo, and M. Goodwin, "Deep Q-Learning with Q-Matrix transfer learning for novel fire evacuation environment," IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2020.
  6. H. R. Lee and T. Lee, "Multi-agent reinforcement learning algorithm to solve a partially-observable multi-agent problem in disaster response," European Journal of Operational Research, Vol.291, No.1, pp.296-308, 2021. https://doi.org/10.1016/j.ejor.2020.09.018
  7. C. Skinner and S. Ramchurn, "The robocup rescue simulation platform," In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1-Volume 1, pp.1647-1648, May 2010.
  8. T. Korhonen and S. Hostikka, "Fire dynamics simulator with evacuation: Fds+ Evac: Technical reference and user's guide," 2009.
  9. P. I. Wojcik and M. Kurdziel, "Training neural networks on high-dimensional data using random projection," Pattern Analysis and Applications, Vol.22, No.3, pp.1221-1231. 2019. https://doi.org/10.1007/s10044-018-0697-0
  10. B. Kovalerchu, B. Agarwal, and D. C. Kalla, "Solving Nonimage Learning Problems by Mapping to Images," International Conference Information Visualization, pp.264-269, 2020.
  11. L. Buturovic and D. Miljkovic, "A novel method for classification of tabular data using convolutional neural networks," BioRxiv, 2020.
  12. A. Sharma, E. Vans, D. Shigemizu, K. A. Boroevich, and T. Tsunoda, "DeepInsight: A methodology to transform a non-image data to an image for convolution neural network architecture," Scientific Reports, Vol.9, No.1, pp.1-7, 2019. https://doi.org/10.1038/s41598-018-37186-2
  13. A. Sharma and D. Kumar, "Non-image data classification with convolutional neural networks," arXiv preprint arXiv: 2007.03218, 2020.
  14. A. Goyal, "Multi-agent deep reinforcement learning for robocup rescue simulator," The Graduate School of The University of Texas at Austin, May 2020.
  15. G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, "Openai gym," arXiv preprint arXiv:1606.01540, 2016.
  16. S. Mika, B. Scholkopf, A. Smola, K. R. Muller, M. Scholz, and G. Ratsch, "Kernel PCA and de-noising in feature spaces," Advances in Neural Information Processing Systems, Vol.11, pp.536-542, 1998.
  17. scikit-learn, sklearn.preprocessing.MinMaxScaler [Internet], https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html
  18. KISTI, NEURON computing environment [Internet], https://www.ksc.re.kr/mobile/ggspcpt/neuron
  19. G. M. Kurtzer, V. Sochat, and M. W. Bauer, "Singularity: Scientific containers for mobility of compute," PloS one, Vol.12, No.5, e0177459, 2017. https://doi.org/10.1371/journal.pone.0177459
  20. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, and A. Desmaison, "Pytorch: An imperative style, high-performance deep learning library," In Advances in Neural Information Processing Systems, pp.8026-8037, 2019.
  21. O. Kramer, "Scikit-learn," In Machine Learning for Evolution Strategies, pp.45-53. Springer, Cham, 2016.