DOI QR코드

DOI QR Code

초협대역 비디오 전송을 위한 심층 신경망 기반 초해상화를 이용한 스케일러블 비디오 코딩

Scalable Video Coding using Super-Resolution based on Convolutional Neural Networks for Video Transmission over Very Narrow-Bandwidth Networks

  • Kim, Dae-Eun (The School of Electrical Engineering, Korea Advanced Institute of Science and Technology) ;
  • Ki, Sehwan (The School of Electrical Engineering, Korea Advanced Institute of Science and Technology) ;
  • Kim, Munchurl (The School of Electrical Engineering, Korea Advanced Institute of Science and Technology) ;
  • Jun, Ki Nam (LIG Nex1) ;
  • Baek, Seung Ho (LIG Nex1) ;
  • Kim, Dong Hyun (Agency for Defense Development) ;
  • Choi, Jeung Won (Agency for Defense Development)
  • 투고 : 2018.10.24
  • 심사 : 2018.12.21
  • 발행 : 2019.01.30

초록

매우 제한된 전송 대역을 이용하여 비디오 데이터를 전송해야 하는 필요성은, 광대역을 통한 비디오 서비스가 활성화되어 있는 현 시점에서도 꾸준히 존재한다. 본 논문에서는 초협대역 네트워크를 통한 저해상도 비디오 전송을 위해, 공간 확장형 스케일러블 비디오 코딩 프레임워크에서 기본 계층의 부호화된 프레임을 심층 신경망 기반 초해상화 기법을 이용하여 업스케일링 하여 향상 계층 부호화 시에 예측 영상으로 활용하여 부호화 효율을 높이는 방법을 제안한다. 기존의 스케일러블 HEVC (High efficiency video coding) 표준에서는 고정된 필터로 업스케일링을 하는데 비해, 본 논문에서는 초해상화 수행을 위해 학습된 심층신경망을 기존의 고정 업스케일링 필터를 대체하여 적용하는 스케일러블 비디오 코딩 프레임워크를 제안한다. 이를 위해 스킵 연결과 잔차 학습 기법 등이 적용된 심층 콘볼루션 신경망 구조를 제안하고, 비디오 코딩 프레임워크의 실제 응용 상황에 맞추어 학습시켰다. 입력 해상도가 $352{\times}288$이고 프레임율이 8fps인 영상을 110kbps로 부호화 하는 응용 상황에서, 기존의 스케일러블 HEVC 프레임워크에 비해 제안하는 스케일러블 비디오 코딩 프레임워크의 화질이 더 높고 부호화 효율이 우수함을 확인할 수 있었다.

The necessity of transmitting video data over a narrow-bandwidth exists steadily despite that video service over broadband is common. In this paper, we propose a scalable video coding framework for low-resolution video transmission over a very narrow-bandwidth network by super-resolution of decoded frames of a base layer using a convolutional neural network based super resolution technique to improve the coding efficiency by using it as a prediction for the enhancement layer. In contrast to the conventional scalable high efficiency video coding (SHVC) standard, in which upscaling is performed with a fixed filter, we propose a scalable video coding framework that replaces the existing fixed up-scaling filter by using the trained convolutional neural network for super-resolution. For this, we proposed a neural network structure with skip connection and residual learning technique and trained it according to the application scenario of the video coding framework. For the application scenario where a video whose resolution is $352{\times}288$ and frame rate is 8fps is encoded at 110kbps, the quality of the proposed scalable video coding framework is higher than that of the SHVC framework.

키워드

BSGHC3_2019_v24n1_132_f0001.png 이미지

그림 1. 공간적 확장 스케일러블 부호화 알고리즘 개념도 Fig. 1. A block diagram of a scalable video coding with spatial scalability

BSGHC3_2019_v24n1_132_f0002.png 이미지

그림 2. 심층신경망 기반 업스케일링 네트워크 구조 Fig. 2. Structure of proposed up-sampling network based on a convolutional neural network

BSGHC3_2019_v24n1_132_f0003.png 이미지

그림 3. 성능 평가에 이용된 실험 영상 Fig. 3. Test sequences for performance evaluation

BSGHC3_2019_v24n1_132_f0004.png 이미지

그림 4. bus 영상의 향상 계층 65 kbps 부호화 결과 영상 (41번째 프레임) Fig. 4. Decoded frame of bus of the enhancement layer at 65 kbps (41st frame)

BSGHC3_2019_v24n1_132_f0005.png 이미지

그림 5. calendar 영상의 향상 계층 45 kbps 부호화 결과 영상 (42번째 프레임) Fig. 5. Decoded frame of calendar of the enhancement layer at 45 kbps (42nd frame)

BSGHC3_2019_v24n1_132_f0006.png 이미지

그림 6. waterfall 영상의 향상 계층 85 kbps 부호화 결과 영상 (64번째 프레임) Fig. 6. Decoded frame of waterfall of the enhancement layer at 85 kbps (64th frame)

표 1. 기존 SHVC 프레임워크와 제안 deepSHVC 프레임워크의 부호화 성능 Table 1. Coding performance of the conventional SHVC and the proposed deepSHVC framework

BSGHC3_2019_v24n1_132_t0001.png 이미지

참고문헌

  1. G. J. Sullivan, J.-R. Ohm, W.-J. Han, T. Wiegand, "Overview of the High Efficiency Video Coding (HEVC) standard," IEEE Trans. Circuits Syst. Video Technol., vol. 22, pp. 1648-1667, Dec. 2012.
  2. B. Bross, Working Draft 1 of Versatile Video Coding, document JVET-J1001, Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, Apr. 2018.
  3. E. Agustsson, F. Mentzer, M. Tschannen, L. Cavigelli, R. Timofte, L. Benini, L. V. Gool, "Soft-to-hard vector quantization for end-to-end learning compressible representations," Proceeding of Advances in Neural Information Processing Systems, Long beach, California, pp. 1141-1151, 2017.
  4. J. Balle, V. Laparra, E. P. Simoncelli, "End-to-end optimized image compression," Proceeding of International Conference on Learning Representations, Toulon, France, 2017.
  5. C.-Y. Wu, N. Singhal, P. Krahenbühl, "Video compression through image interpolation," Proceeding of European Conference on Computer Vision, Munich, Germany, 2018.
  6. W.-S. Park, M. Kim, "CNN-based In-loop Filtering for Coding Efficiency Improvement," Proceeding of IEEE Image Video and Multidimensional Signal Processing (IVMSP) workshop, Bordeaux, France, pp. 1-5, 2016.
  7. N. Yan, D. Liu, H. Li, F. Wu, "A convolutional neural network approach for half-pel interpolation in video coding," Proceeding of International Symposium on Circuits and Systems, Baltimore, Maryland, pp. 1-4, 2017.
  8. D. Liu, H. Ma, Z. Xiong, F. Wu, "CNN-based DCT-like transform for image compression," Proceeding of International Conference on Multimedia Modeling, Bangkok, Thailand, pp. 61-72, 2018.
  9. Z. Liu, X. Yu, Y. Gao, S. Chen, X. Ji, D. Wang, "CU partition mode decision for HEVC hardwired intra encoder using convolution neural network," IEEE Trans. Image Processing, vol. 25, no. 11, pp. 5088-5103, Nov. 2016. https://doi.org/10.1109/TIP.2016.2601264
  10. H. Schwarz, D. Marpe, and T. Wiegand, "Overview of the scalable video coding extension of the H.264/AVC standard," IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 9, pp. 1103-1120, Sep. 2007. https://doi.org/10.1109/TCSVT.2007.905532
  11. J. M. Boyce, Y. Ye, J. Chen, A. K. Ramasubramonian, "Overview of SHVC: Scalable extensions of the High Efficiency Video Coding (HEVC) standard," IEEE Trans. Circuits Syst. Video Technol., vol. 26, no. 1, pp. 20-34, Jan. 2016. https://doi.org/10.1109/TCSVT.2015.2461951
  12. W. Shi, J. Caballero, F. Huszar, J. Totz, A.P. Aitken, R. Bishop, D. Rueckert, Z. Wang, "Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, pp. 1874-1883, 2016.
  13. A. L. Maas, A. Y. Hannun, A. Y. Ng, "Rectifier nonlinearities improve neural network acoustic models," Proceeding of International Conference on Machine Learning, Atlanta, Georgia, p. 3, 2013.
  14. D. P. Kingma, J. L. Ba, "Adam: A method for stochastic optimization," Proceeding of International Conference for Learning Representations, San Diego, California, pp. 1-41, 2015.
  15. Ultra Video Groupm, http://ultravideo.cs.tut.fi/#testsequences (accessed Jan. 2, 2019).
  16. SJTU 4K Video Sequence, http://medialab.sjtu.edu.cn/web4k/index.html (accessed Jan. 2, 2019).