DOI QR코드

DOI QR Code

Modified AWSSDR method for frequency-dependent reverberation time estimation

주파수 대역별 잔향시간 추정을 위한 변형된 AWSSDR 방식

  • Min Sik Kim (Research Institute of Computer, Information and Communication, Pusan National University) ;
  • Hyung Soon Kim (Department of Electronics Engineering, Pusan National University)
  • 김민식 (부산대학교 컴퓨터및정보통신연구소) ;
  • 김형순 (부산대학교 전자공학과)
  • Received : 2023.11.22
  • Accepted : 2023.12.08
  • Published : 2023.12.31

Abstract

Reverberation time (T60) is a typical acoustic parameter that provides information about reverberation. Since the impacts of reverberation vary depending on the frequency bands even in the same space, frequency-dependent (FD) T60, which offers detailed insights into the acoustic environments, can be useful. However, most conventional blind T60 estimation methods, which estimate the T60 from speech signals, focus on fullband T60 estimation, and a few blind FDT60 estimation methods commonly show poor performance in the low-frequency bands. This paper introduces a modified approach based on Attentive pooling based Weighted Sum of Spectral Decay Rates (AWSSDR), previously proposed for blind T60 estimation, by extending its target from fullband T60 to FDT60. The experimental results show that the proposed method outperforms conventional blind FDT60 estimation methods on the acoustic characterization of environments (ACE) challenge evaluation dataset. Notably, it consistently exhibits excellent estimation performance in all frequency bands. This demonstrates that the mechanism of the AWSSDR method is valuable for blind FDT60 estimation because it reflects the FD variations in the impact of reverberation, aggregating information about FDT60 from the speech signal by processing the spectral decay rates associated with the physical properties of reverberation in each frequency band.

잔향시간(reverberation time, T60)은 대표적인 음향 매개 변수로서, 잔향에 대한 정보를 제공한다. 동일한 공간이라도 주파수 대역에 따라 잔향이 미치는 영향은 다르기 때문에, 주파수 대역별(frequency-dependent, FD) T60은 음향환경에 대한 세부적인 정보를 제공하여 유용하게 사용될 수 있다. 하지만 음성신호로부터 T60을 추정하는 기존의 블라인드 T60 추정 방식들은 대부분 전 대역 T60 추정에 집중되어 있으며, 소수의 블라인드 FDT60 추정 방식들은 공통적으로 저주파 대역에서 열악한 성능을 보인다. 본 논문은 블라인드 FDT60 추정을 위해, 이전에 제안한 주의 집중 풀링 기반 스펙트럼 감쇠율의 가중 합(Attentive pooling based Weighted Sum of Spectral Decay Rates, AWSSDR) 방식을 변형하여 목표를 전 대역 T60에서 FDT60으로 확장하였다. 본 논문에서 제안한 방식은 ACE challenge의 평가데이터 셋에 대한 성능 평가 결과, 기존의 블라인드 FDT60 추정 방식들보다 우수한 성능을 달성하였으며, 특히, 모든 주파수 대역에서 일관성 있는 우수한 추정 성능을 보였다. 이는, 잔향의 물리적인 특성과 관련된 스펙트럼 감쇠율을 주파수 대역별로 처리하여, 음성신호로부터 FDT60에 대한 정보를 취합하는, AWSSDR 방식의 매커니즘이 주파수에 따라 변하는 잔향의 영향을 반영하여 FDT60 추정에 유용함을 보여준다.

Keywords

Acknowledgement

This work was supported by a 2-Year Research Grant of Pusan National University.

References

  1. Bryan, N. J. (2020, May). Impulse response data augmentation and deep neural networks for blind room acoustic parameter estimation. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (pp. 5000-5004). Barcelona, Spain.
  2. Chen, S. J., Xia, W., & Hansen, J. H. L. (2021, December). Scenario aware speech recognition: Advancements for apollo fearless steps & chime-4 corpora. Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (pp. 289-295). Cartagena, Colombia.
  3. Deng, S., Mack, W., & Habets, E. A. P. (2020, October). Online blind reverberation time estimation using CRNNs. Proceedings of Interspeech (pp. 5061-5065). Shanghai, China.
  4. Diether, S., Bruderer, L., Streich, A., & Loeliger, H. A. (2015, April). Efficient blind estimation of subband reverberation time from speech in non-diffuse environments. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 743-747). South Brisbane, Australia.
  5. Eaton, J., & Naylor, P. A. (2015a, October). Reverberation time estimation on the ACE corpus using the SDD method. Proceedings of the ACE Challenge Workshop, a Satellite of IEEE WASPAA (pp. 1-3). New Paltz, NY.
  6. Eaton, J., & Naylor, P. A. (2015b, October). Acoustic characterization of environments (ACE) corpus software instructions. Proceedings of the ACE Challenge Workshop, a Satellite Event of IEEE WASPAA (pp. 1-5). New Paltz, NY, USA.
  7. Eaton, J., Gaubitch, N. D., & Naylor, P. A. (2013, May). Noise-robust reverberation time estimation using spectral decay distributions with reduced computational cost. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 161-165). Vancouver, Canada.
  8. Eaton, J., Gaubitch, N. D., Moore, A. H., & Naylor, P. A. (2016). Estimation of room acoustic parameters: The ACE challenge. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(10), 1681-1693. https://doi.org/10.1109/TASLP.2016.2577502
  9. Eaton, J., Gaubitch, N. D., Moore, A. H., & Naylor, P. A. (2017). Acoustic characterization of environments (ACE) challenge results technical report. arXiv. Retrieved from https://arxiv.org/abs/1606.03365
  10. Gamper, H., & Tashev, I. J. (2018, September). Blind reverberation time estimation using a convolutional neural network. Proceedings of the 16th International Workshop on Acoustic Signal Enhancement (pp. 136-140). Tokyo, Japan.
  11. Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G., Pallett, D. S., & Dahlgren, N. L. (1993). DARPA TIMIT: Acoustic-phonetic continuous speech corpus CD-ROM: NIST speech disc 1-1.1 (Technical Report NISTIR 4930). Gaithersburg, MD: National Institute Standards Technology.
  12. Giri, R., Seltzer, M. L., Droppo, J., & Yu, D. (2015, April). Improving speech recognition in reverberation using a room-aware deep neural network and multi-task learning. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5014-5018). South Brisbane, Australia.
  13. International Organization for Standardization. (2009). ISO 3382: Acoustics - Measurement of the reverberation time of rooms with reference to other acoustical parameters (2nd ed.). Geneva, Switzerland: International Organization for Standardization.
  14. Karjalainen, M., Ansalo, P., Makivirta, A., Peltonen, T., & Valimaki, V. (2002). Estimation of modal decay parameters from noisy response measurements. Journal of Audio Engineering Society, 50(11), 867-878.
  15. Kim, M. S., & Kim, H. S. (2022). Attentive pooling-based weighted sum of spectral decay rates for blind estimation of reverberation time. IEEE Signal Processing Letters, 29, 1639-1643. https://doi.org/10.1109/LSP.2022.3191248
  16. Kim, M. S., & Kim, H. S. (2023, June). Frequency-dependent T60 estimation using attentive pooling based weighted sum of spectral decay rates. Proceedings of the 2023 Spring Conference on Korean Society of Speech Sciences (KSSS). Seoul, Korea.
  17. Kuttruff, H. (2019). Room acoustics (6th ed.). Boca Raton, FL: CRC Press.
  18. Li, S., Schlieper, R., & Peissig, J. (2019, May). A hybrid method for blind estimation of frequency dependent reverberation time using speech signals. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 211-215). Brighton, UK.
  19. Lollmann, H. W., & Vary, P. (2011, May). Estimation of the frequency dependent reverberation time by means of warped filter-banks. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 309-312). Prague, Czech Republic.
  20. Lollmann, H. W., Brendel, A., Vary, P., & Kellermann, W. (2015, October). Single-channel maximum-likelihood T60 estimation exploiting subband information. Proceedings of the ACE Challenge Workshop, a Satellite of IEEE WASPAA (pp. 1-3). New Paltz, NY.
  21. Parihar, N., & Picone, J. (2002). Aurora working group: DSR front end LVCSR evaluation au/384/02 (Institute Signal Information Processing, Mississippi, MS, USA, Technical Report AU/384/02). Retrieved from https://isip.piconepress.com/publications/reports/aurora_frontend/2002/report_012202_v21.pdf
  22. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen T, ... Chintala, S. (2019, December). PyTorch: An imperative style, high-performance deep learning library. Proceedings of the Advances in Neural Information Processing Systems 32 (NeurIPS 2019) (pp. 8024-8035). Vancouver, Canada.
  23. Prego, T. M., de Lima, A. A., Zambrano-Lopez, R., & Netto, S. L. (2015, October). Blind estimators for reverberation time and direct-to-reverberant energy ratio using subband speech decomposition. Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (pp. 1-5). New Paltz, NY.
  24. Tang, Z., & Manocha, D. (2021). Scene-aware far-field automatic speech recognition. arXiv. Retrieved from https://arxiv.org/abs/2104.10757
  25. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., ... Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems 30 (NIPS) (pp. 5998-6008). Long Beach, CA.
  26. Wang, H., Wu, B., Chen, L., Yu, M., Yu, J., Xu, Y., Zhang, S. X., ... Yu, D. (2021, August). Tecanet: Temporal-contextual attention network for environment-aware speech dereverberation. Proceedings of the Interspeech Conference (pp. 1109-1113). Brno, Czechia.
  27. Wu, B., Li, K., Yang, M., & Lee, C. H. (2017). A reverberation-time-aware approach to speech dereverberation based on deep neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(1), 102-111. https://doi.org/10.1109/TASLP.2016.2623559
  28. Xiong, F., Goetze, S., Kollmeier, B., & Meyer, B. T. (2018). Exploring auditory-inspired acoustic features for room acoustic parameter estimation from monaural speech. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(10), 1809-1820. https://doi.org/10.1109/TASLP.2018.2843537
  29. Zhang, Z., Li, X., Li, Y., Dong, Y., Wang, D., & Xiong, S. (2021, June). Neural noise embedding for end-to-end speech enhancement with conditional layer normalization. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 7113-7117). Toronto, Canada.
  30. Zheng, K., Zheng, C., Sang, J., Zhang, Y., & Li, X. (2022). Noise-robust blind reverberation time estimation using noise-aware time-frequency masking. Measurement, 192, 110901.