A DNN-Based Personalized HRTF Estimation Method for 3D Immersive Audio

Son, Ji Su;Choi, Seung Ho;

doi:10.7236/IJIBC.2021.13.1.161

International Journal of Internet, Broadcasting and Communication

Volume 13 Issue 1
/
Pages.161-167
/
2021
/
2288-4920(pISSN)
/
2288-4939(eISSN)

The Institute of Internet, Broadcasting and Communication (한국인터넷방송통신학회)

DOI QR Code

A DNN-Based Personalized HRTF Estimation Method for 3D Immersive Audio

Son, Ji Su (Dept. of Electronic and IT Media Engineering, Seoul National University of Science and Technology) ;
Choi, Seung Ho (Dept. of Electronic and IT Media Engineering, Seoul National University of Science and Technology)

Received : 2020.12.29
Accepted : 2021.01.09
Published : 2021.02.28

https://doi.org/10.7236/IJIBC.2021.13.1.161 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

This paper proposes a new personalized HRTF estimation method which is based on a deep neural network (DNN) model and improved elevation reproduction using a notch filter. In the previous study, a DNN model was proposed that estimates the magnitude of HRTF by using anthropometric measurements [1]. However, since this method uses zero-phase without estimating the phase, it causes the internalization (i.e., the inside-the-head localization) of sound when listening the spatial sound. We devise a method to estimate both the magnitude and phase of HRTF based on the DNN model. Personalized HRIR was estimated using the anthropometric measurements including detailed data of the head, torso, shoulders and ears as inputs for the DNN model. After that, the estimated HRIR was filtered with an appropriate notch filter to improve elevation reproduction. In order to evaluate the performance, both of the objective and subjective evaluations are conducted. For the objective evaluation, the root mean square error (RMSE) and the log spectral distance (LSD) between the reference HRTF and the estimated HRTF are measured. For subjective evaluation, the MUSHRA test and preference test are conducted. As a result, the proposed method can make listeners experience more immersive audio than the previous methods.

Keywords

References

T. Chen, T. Kuo and T. Chi, "Autoencoding HRTFS for DNN Based HRTF Personalization Using Anthr opometric Features," ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, United Kingdom, 2019, pp. 271-275. DOI: https://doi.org/10.1109/ICASSP.2019.8683814.
Rumsey, F. (2001). Spatial Audio (1st ed.). Routledge. DOI: https://doi.org/10.4324/9780080498195
Begault, R.D. 3D Sound for Virtual Reality and Multimedia; Academic Press: Cambridge, MA, USA, 1994. DOI: https://doi.org/10.2307/3680997
Kistler, D.J.; Wightman, F.L., "A model of head-related transfer functions based on principal components analysis and minimum-phase reconstruction," J. Acoust. Soc. Am. 1992, 91, 1637-1647. DOI: https://doi.org/10.1121/1.402444
Ngai-Man Cheung, S. Trautmann and A. Horner, "Head-related transfer function modeling in 3-D sound systems with genetic algorithms," Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181), Seattle, WA, USA, 1998, pp. 3529-3532. DOI: https://doi.org/10.1109/ICASSP.1998.679630.
Hu, H.; Zhou, L.; Ma, H.; Wu, Z., "HRTF personalization based on artificial neural network in individual virtual auditory space," Appl. Acoust. 2008, 69, 163-172. DOI: https://doi.org/10.1016/j.apacoust.2007.05.007
Chun, C.J.; Moon, J.M.; Lee, G.W.; Kim, N.K.; Kim, H.K., "Deep neural network based HRTF personalization using anthropometric measurements," In Proceedings of the 143rd AES Convention, New York, NY, USA, 18-21 October 2017. Preprint 9860. DOI: https://doi.org/10.3390/app8112180
V. C. Raykar and R. Duraiswami, "Extracting the frequencies of the pinna spectral notches in measured head related impulse responses," J. Acoust. Soc. Am., vol. 118, no. 1, pp. 364-374, July 2005. DOI: https://doi.org/10.1121/1.1923368
Hebrank, J. and Wright, D. (1974b), "Spectral cues used in the location of sound sources on the median plane," J. Acoust. Soc. Am. 56, 1829-1834. DOI: https://doi.org/10.1121/1.1903520
V. R. Algazi, R. O. Duda, D. M. Thompson and C. Avendano, "The CIPIC HRTF database," Proceeding s of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No. 01TH8575), New Platz, NY, USA, 2001, pp. 99-102. DOI: https://doi.org/10.1109/aspaa.2001.969552

International Journal of Internet, Broadcasting and Communication

A DNN-Based Personalized HRTF Estimation Method for 3D Immersive Audio

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)