DOI QR코드

DOI QR Code

Indoor Scene Classification based on Color and Depth Images for Automated Reverberation Sound Editing

자동 잔향 편집을 위한 컬러 및 깊이 정보 기반 실내 장면 분류

  • Jeong, Min-Heuk (School of Electronics and Information Engineering, Korea Aerospace University) ;
  • Yu, Yong-Hyun (School of Electronics and Information Engineering, Korea Aerospace University) ;
  • Park, Sung-Jun (School of Electronics and Information Engineering, Korea Aerospace University) ;
  • Hwang, Seung-Jun (School of Electronics and Information Engineering, Korea Aerospace University) ;
  • Baek, Joong-Hwan (School of Electronics and Information Engineering, Korea Aerospace University)
  • Received : 2020.01.30
  • Accepted : 2020.02.24
  • Published : 2020.03.31

Abstract

The reverberation effect on the sound when producing movies or VR contents is a very important factor in the realism and liveliness. The reverberation time depending the space is recommended in a standard called RT60(Reverberation Time 60 dB). In this paper, we propose a scene recognition technique for automatic reverberation editing. To this end, we devised a classification model that independently trains color images and predicted depth images in the same model. Indoor scene classification is limited only by training color information because of the similarity of internal structure. Deep learning based depth information extraction technology is used to use spatial depth information. Based on RT60, 10 scene classes were constructed and model training and evaluation were conducted. Finally, the proposed SCR + DNet (Scene Classification for Reverb + Depth Net) classifier achieves higher performance than conventional CNN classifiers with 92.4% accuracy.

영화나 VR 콘텐츠 제작 시 음향에 잔향 효과를 주는 것은 현장감과 생동감을 느끼게 하는데 매우 중요한 요소이다. 공간에 따른 음향의 잔향 시간은 RT60(Reverberation Time 60dB)이라는 표준에서 권고된다. 본 논문에서는 음향 편집 시 자동 잔향 편집을 위한 장면 인식 기법을 제안한다. 이를 위해 컬러 이미지와 예측된 깊이 이미지를 동일한 모델에 독립적으로 학습하는 분류 모델을 설계하였다. 실내 장면 분류는 내부 구조가 유사한 클래스가 존재하여 컬러 정보 학습만으로는 인식률의 한계가 존재한다. 공간의 깊이 정보를 사용하기 위해 딥러닝 기반의 깊이 정보 추출 기술을 사용하였다. RT60을 기반으로 총 10개의 장면 클래스를 구성하고 모델 학습 및 평가를 진행하였다. 최종적으로 제안하는 SCR+DNet(Scene Classification for Reverb+Depth Net) 분류기는 92.4%의 정확도로 기존의 CNN 분류기들보다 더 높은 성능을 달성하였다.

Keywords

References

  1. International Organization for Standardization. Acoustics: Measurement of the reverberation time of rooms with reference to other acoustical parameters. ISO, 1997.
  2. K. Alex, S. Ilya, and G. E. Hinton. "Imagenet classification with deep convolutional neural networks," Advances in Neural Information Processing Systems, Lake Tahoe: NV, pp. 1097-1105, 2012.
  3. S. Karen, and Z. Andrew. "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
  4. K. He, X. Zhang, S. Ren, and J. Sun. "Deep residual learning for image recognition," Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Las Vegas: NV pp. 770-778, 2016.
  5. A. Yashwanth, S. Shammer, R. Sairam, and G. Chamundeeswari. "A novel approach for indoor-outdoor scene classification using transfer learning," International Journal of Advance Research, Ideas and Innovations in Technology, vol. 5, no. 2, pp. 1756-1762, Mar. 2019.
  6. V. Casser, S. Pirk, R. Mahjourian, and A. Angelova. "Unsupervised monocular depth and ego-motion learning with structure and semantics," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Long Beach: CA, 2019.
  7. J. Hu, M. Ozay, Y. Zhang, and T. Okatani, "Revisiting single image depth estimation: Toward higher resolution maps with accurate object boundaries," IEEE Winter Conference on Applications of Computer Vision, Waikoloa Village: HI, pp. 1043-1051, 2019.
  8. B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba. "Places: A 10 million image database for scene recognition," IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 6, pp. 1452-1464, Jul. 2017. https://doi.org/10.1109/tpami.2017.2723009