DOI QR코드

DOI QR Code

Low delay window switching modified discrete cosine transform for speech and audio coder

음성 및 오디오 부호화기를 위한 저지연 윈도우 스위칭 modified discrete cosine transform

  • 김영준 (충북대학교 정보통신공학부) ;
  • 이인성 (충북대학교 정보통신공학부)
  • Received : 2017.12.27
  • Accepted : 2018.03.29
  • Published : 2018.03.31

Abstract

In this paper, we propose a low delay window switching MDCT (Modified Discrete Cosine Transform) method for speech/audio coder. The window switching algorithm is used to reduce the degradation of sound quality in non-stationary trasient duration and to reduce the algorithm delay by using the low delay TDAC (Time Domain Aliasing Cancellation). While the conventional window switching algorithms uses overlap-add with different lengths, the proposed method uses the fixed overlap add length. It results the reduction of algorithm delay by half and 1 bit reduction in frame indication information by using 2 window types. We apply the proposed algorithm to G.729.1 based on MDCT in order to evaluate the performance. The propose method shows the reduction of algorithm delay by half while speech quality of the proposed method maintains same as the conventional method.

본 논문에서는 음성/오디오 부호화기를 위한 저지연 윈도우 스위칭 MDCT(Modified Discrete Cosine Transform) 방법을 제안한다. 윈도우 스위칭 알고리즘을 사용하여 신호의 특성이 빨리 변하는 전이 구간에서 음질 저하를 개선하고, 저지연 TDAC(Time Domain Aliasing Cancellation)를 사용하여 알고리즘 지연을 1/2로 줄일 수 있는 MDCT 방법을 제안한다. 제안된 윈도우 스위칭 방법은 기존 윈도우 스위칭 알고리즘이 다른 길이의 중첩합(overlap-add)을 사용하는 것과 달리, 일정한 길이의 중첩합을 사용하여 알고리즘 지연을 1/2로 줄일 수 있었고, 신호의 특성에 따라 윈도우의 종류를 2개로 줄여 프레임 상태를 표현하는 정보 비트를 1 bit 감소시킬 수 있었다. 제안한 알고리즘을 MDCT 기반의 음성/오디오 부호화기인 ITU-T(International Telecommunication Union - Telecommunication) G.729.1 부호화기에 적용하여 성능을 확인하였으며, 알고리즘 지연을 절반으로 감소시키면서 동일한 음질을 유지할 수 있었다.

Keywords

References

  1. M. S. Lee, D. Y. Kim, and B. S. Lee, "Trends of codec technology for 4 G mobile enhanced voice service" (in Korean), Electrinics and Telecommunications Trends, 25, 29-37 (2010).
  2. L. Yaroslavsky and Y. Wang, "DFT, DCT, MDCT, DST and signal Fourier spectrum analysis," Proc. 10th EUSIPCO, 1065-1068 (2000).
  3. ISO/IEC 13818-3: "Information technology - generic coding of moving pictures and associated audio information - Part 3: Audio," 1998.
  4. ISO/IEC 14496-3: "2005/FPDAM9, Enhanced low delay AAC," 2007.
  5. H. S. Malvar, "Lapped transforms for efficient transform/subband codin," IEEE Trans. Acoustics, Speech, Signal Process, 38, 969-978 (1990). https://doi.org/10.1109/29.56057
  6. J. Princen and A. Bradley, "Analysis/Synthesis filter bank design based on time domain aliasing cancellation," IEEE Trans. Acoustics, Speech, Signal Process, 34, 1153-1161 (1986). https://doi.org/10.1109/TASSP.1986.1164954
  7. R. Geiger, J. Herre, M. Jander, M. Multrus, M. Schmidt, M. Schell, and G. Schuller, "Enhanced MPEG-4 low delay AAC - low bitrate high quality communication:," Audio Engineering Society Convention 122 (2007).
  8. G. Fuchs, C. R. Helmrich, G. Markovic, M. Neusinger, E. Ravelli, and T. Moriya, "Low delay LPC and MDCT-based audio coding in the EVS codec," IEEE Trans. Acoustics, Speech, Signal Process, 5723-5727 (2015).
  9. Y. Wang and M. Vilermo, "The modified discrete cosine transform: Its implications for audio coding and error concealment," J. Audio Eng. Soc. 51, 1-10 (2012).
  10. ITU-T Rec. G.729.1, An 8-32 Kbit/S Scalable Wideband Coder Bitstream Interoperable with G.729, 2006.