A DNN-Based Personalized HRTF Estimation Method for 3D Immersive Audio

  • Son, Ji Su (Dept. of Electronic and IT Media Engineering, Seoul National University of Science and Technology) ;
  • Choi, Seung Ho (Dept. of Electronic and IT Media Engineering, Seoul National University of Science and Technology)
  • Received : 2020.12.29
  • Accepted : 2021.01.09
  • Published : 2021.02.28


This paper proposes a new personalized HRTF estimation method which is based on a deep neural network (DNN) model and improved elevation reproduction using a notch filter. In the previous study, a DNN model was proposed that estimates the magnitude of HRTF by using anthropometric measurements [1]. However, since this method uses zero-phase without estimating the phase, it causes the internalization (i.e., the inside-the-head localization) of sound when listening the spatial sound. We devise a method to estimate both the magnitude and phase of HRTF based on the DNN model. Personalized HRIR was estimated using the anthropometric measurements including detailed data of the head, torso, shoulders and ears as inputs for the DNN model. After that, the estimated HRIR was filtered with an appropriate notch filter to improve elevation reproduction. In order to evaluate the performance, both of the objective and subjective evaluations are conducted. For the objective evaluation, the root mean square error (RMSE) and the log spectral distance (LSD) between the reference HRTF and the estimated HRTF are measured. For subjective evaluation, the MUSHRA test and preference test are conducted. As a result, the proposed method can make listeners experience more immersive audio than the previous methods.


  1. T. Chen, T. Kuo and T. Chi, "Autoencoding HRTFS for DNN Based HRTF Personalization Using Anthr opometric Features," ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, United Kingdom, 2019, pp. 271-275. DOI:
  2. Rumsey, F. (2001). Spatial Audio (1st ed.). Routledge. DOI:
  3. Begault, R.D. 3D Sound for Virtual Reality and Multimedia; Academic Press: Cambridge, MA, USA, 1994. DOI:
  4. Kistler, D.J.; Wightman, F.L., "A model of head-related transfer functions based on principal components analysis and minimum-phase reconstruction," J. Acoust. Soc. Am. 1992, 91, 1637-1647. DOI:
  5. Ngai-Man Cheung, S. Trautmann and A. Horner, "Head-related transfer function modeling in 3-D sound systems with genetic algorithms," Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181), Seattle, WA, USA, 1998, pp. 3529-3532. DOI:
  6. Hu, H.; Zhou, L.; Ma, H.; Wu, Z., "HRTF personalization based on artificial neural network in individual virtual auditory space," Appl. Acoust. 2008, 69, 163-172. DOI:
  7. Chun, C.J.; Moon, J.M.; Lee, G.W.; Kim, N.K.; Kim, H.K., "Deep neural network based HRTF personalization using anthropometric measurements," In Proceedings of the 143rd AES Convention, New York, NY, USA, 18-21 October 2017. Preprint 9860. DOI:
  8. V. C. Raykar and R. Duraiswami, "Extracting the frequencies of the pinna spectral notches in measured head related impulse responses," J. Acoust. Soc. Am., vol. 118, no. 1, pp. 364-374, July 2005. DOI:
  9. Hebrank, J. and Wright, D. (1974b), "Spectral cues used in the location of sound sources on the median plane," J. Acoust. Soc. Am. 56, 1829-1834. DOI:
  10. V. R. Algazi, R. O. Duda, D. M. Thompson and C. Avendano, "The CIPIC HRTF database," Proceeding s of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No. 01TH8575), New Platz, NY, USA, 2001, pp. 99-102. DOI: