Acknowledgement
This work was supported by a 2-Year Research Grant of Pusan National University.
References
- Bryan, N. J. (2020, May). Impulse response data augmentation and deep neural networks for blind room acoustic parameter estimation. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (pp. 5000-5004). Barcelona, Spain.
- Chen, S. J., Xia, W., & Hansen, J. H. L. (2021, December). Scenario aware speech recognition: Advancements for apollo fearless steps & chime-4 corpora. Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (pp. 289-295). Cartagena, Colombia.
- Deng, S., Mack, W., & Habets, E. A. P. (2020, October). Online blind reverberation time estimation using CRNNs. Proceedings of Interspeech (pp. 5061-5065). Shanghai, China.
- Diether, S., Bruderer, L., Streich, A., & Loeliger, H. A. (2015, April). Efficient blind estimation of subband reverberation time from speech in non-diffuse environments. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 743-747). South Brisbane, Australia.
- Eaton, J., & Naylor, P. A. (2015a, October). Reverberation time estimation on the ACE corpus using the SDD method. Proceedings of the ACE Challenge Workshop, a Satellite of IEEE WASPAA (pp. 1-3). New Paltz, NY.
- Eaton, J., & Naylor, P. A. (2015b, October). Acoustic characterization of environments (ACE) corpus software instructions. Proceedings of the ACE Challenge Workshop, a Satellite Event of IEEE WASPAA (pp. 1-5). New Paltz, NY, USA.
- Eaton, J., Gaubitch, N. D., & Naylor, P. A. (2013, May). Noise-robust reverberation time estimation using spectral decay distributions with reduced computational cost. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 161-165). Vancouver, Canada.
- Eaton, J., Gaubitch, N. D., Moore, A. H., & Naylor, P. A. (2016). Estimation of room acoustic parameters: The ACE challenge. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(10), 1681-1693. https://doi.org/10.1109/TASLP.2016.2577502
- Eaton, J., Gaubitch, N. D., Moore, A. H., & Naylor, P. A. (2017). Acoustic characterization of environments (ACE) challenge results technical report. arXiv. Retrieved from https://arxiv.org/abs/1606.03365
- Gamper, H., & Tashev, I. J. (2018, September). Blind reverberation time estimation using a convolutional neural network. Proceedings of the 16th International Workshop on Acoustic Signal Enhancement (pp. 136-140). Tokyo, Japan.
- Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G., Pallett, D. S., & Dahlgren, N. L. (1993). DARPA TIMIT: Acoustic-phonetic continuous speech corpus CD-ROM: NIST speech disc 1-1.1 (Technical Report NISTIR 4930). Gaithersburg, MD: National Institute Standards Technology.
- Giri, R., Seltzer, M. L., Droppo, J., & Yu, D. (2015, April). Improving speech recognition in reverberation using a room-aware deep neural network and multi-task learning. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5014-5018). South Brisbane, Australia.
- International Organization for Standardization. (2009). ISO 3382: Acoustics - Measurement of the reverberation time of rooms with reference to other acoustical parameters (2nd ed.). Geneva, Switzerland: International Organization for Standardization.
- Karjalainen, M., Ansalo, P., Makivirta, A., Peltonen, T., & Valimaki, V. (2002). Estimation of modal decay parameters from noisy response measurements. Journal of Audio Engineering Society, 50(11), 867-878.
- Kim, M. S., & Kim, H. S. (2022). Attentive pooling-based weighted sum of spectral decay rates for blind estimation of reverberation time. IEEE Signal Processing Letters, 29, 1639-1643. https://doi.org/10.1109/LSP.2022.3191248
- Kim, M. S., & Kim, H. S. (2023, June). Frequency-dependent T60 estimation using attentive pooling based weighted sum of spectral decay rates. Proceedings of the 2023 Spring Conference on Korean Society of Speech Sciences (KSSS). Seoul, Korea.
- Kuttruff, H. (2019). Room acoustics (6th ed.). Boca Raton, FL: CRC Press.
- Li, S., Schlieper, R., & Peissig, J. (2019, May). A hybrid method for blind estimation of frequency dependent reverberation time using speech signals. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 211-215). Brighton, UK.
- Lollmann, H. W., & Vary, P. (2011, May). Estimation of the frequency dependent reverberation time by means of warped filter-banks. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 309-312). Prague, Czech Republic.
- Lollmann, H. W., Brendel, A., Vary, P., & Kellermann, W. (2015, October). Single-channel maximum-likelihood T60 estimation exploiting subband information. Proceedings of the ACE Challenge Workshop, a Satellite of IEEE WASPAA (pp. 1-3). New Paltz, NY.
- Parihar, N., & Picone, J. (2002). Aurora working group: DSR front end LVCSR evaluation au/384/02 (Institute Signal Information Processing, Mississippi, MS, USA, Technical Report AU/384/02). Retrieved from https://isip.piconepress.com/publications/reports/aurora_frontend/2002/report_012202_v21.pdf
- Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen T, ... Chintala, S. (2019, December). PyTorch: An imperative style, high-performance deep learning library. Proceedings of the Advances in Neural Information Processing Systems 32 (NeurIPS 2019) (pp. 8024-8035). Vancouver, Canada.
- Prego, T. M., de Lima, A. A., Zambrano-Lopez, R., & Netto, S. L. (2015, October). Blind estimators for reverberation time and direct-to-reverberant energy ratio using subband speech decomposition. Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (pp. 1-5). New Paltz, NY.
- Tang, Z., & Manocha, D. (2021). Scene-aware far-field automatic speech recognition. arXiv. Retrieved from https://arxiv.org/abs/2104.10757
- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., ... Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems 30 (NIPS) (pp. 5998-6008). Long Beach, CA.
- Wang, H., Wu, B., Chen, L., Yu, M., Yu, J., Xu, Y., Zhang, S. X., ... Yu, D. (2021, August). Tecanet: Temporal-contextual attention network for environment-aware speech dereverberation. Proceedings of the Interspeech Conference (pp. 1109-1113). Brno, Czechia.
- Wu, B., Li, K., Yang, M., & Lee, C. H. (2017). A reverberation-time-aware approach to speech dereverberation based on deep neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(1), 102-111. https://doi.org/10.1109/TASLP.2016.2623559
- Xiong, F., Goetze, S., Kollmeier, B., & Meyer, B. T. (2018). Exploring auditory-inspired acoustic features for room acoustic parameter estimation from monaural speech. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(10), 1809-1820. https://doi.org/10.1109/TASLP.2018.2843537
- Zhang, Z., Li, X., Li, Y., Dong, Y., Wang, D., & Xiong, S. (2021, June). Neural noise embedding for end-to-end speech enhancement with conditional layer normalization. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 7113-7117). Toronto, Canada.
- Zheng, K., Zheng, C., Sang, J., Zhang, Y., & Li, X. (2022). Noise-robust blind reverberation time estimation using noise-aware time-frequency masking. Measurement, 192, 110901.