DOI QR코드

DOI QR Code

Cover song search based on magnitude and phase of the 2D Fourier transform

이차원 퓨리에 변환의 크기와 위상을 이용한 커버곡 검색

  • Seo, Jin Soo (Department of Electronic Engineering, Gangneung-Wonju National University)
  • 서진수 (강릉원주대학교 전자공학과)
  • Received : 2018.09.10
  • Accepted : 2018.11.21
  • Published : 2018.11.30

Abstract

The cover song refers to live recordings or reproduced albums. This paper studies two-dimensional Fourier transform as a feature-dimension reduction method to search cover song fast. The two-dimensional Fourier transform is conducive in feature-dimension reduction for cover song search due to musical-key invariance. This paper extends the previous work, which only utilize the magnitude of the Fourier transform, by introducing an invariant from phase based on the assumption that adjacent frames have the same musical-key change. We compare the cover song retrieval accuracy of the Fourier-transform based methods over two datasets. The experimental results show that the addition of the invariant from phase improves the cover song retrieval accuracy over the previous magnitude-only method.

라이브 음악 또는 리메이크를 통해서 재발매된 음악을 원곡의 커버곡이라 부른다. 본 논문은 고속 커버곡 검색을 위한 특징 축약을 위해 2차원 퓨리에 변환을 이용하는 방법을 연구하였다. 이차원 퓨리에 변환은 조변화에 대해서 불변성을 가지고 있으므로, 커버곡 검색을 위한 특징 축약 방법으로 적합하다. 기존 퓨리에 변환 방법에서는 크기값 만을 활용하였으나, 본 논문에서는 인접한 크로마 블록은 같은 조변화를 가진다는 가정하에 위상 정보를 추가로 활용하는 방법을 제안하였다. 두 가지 커버곡 실험 데이터셋에서 성능 비교를 수행하였으며, 제안된 방법이 기존 방법에 비해서 우수한 커버곡 검색 정확도를 보임을 확인하였다.

Keywords

GOHHBH_2018_v37n6_518_f0001.png 이미지

Fig. 1. Overview of the cover song search system based on song-level chromagram summarization.[8]

GOHHBH_2018_v37n6_518_f0002.png 이미지

Fig. 2. Chromagram summarization using 2D Fourier transform.

GOHHBH_2018_v37n6_518_f0003.png 이미지

Fig. 4. Search accuracy (%) versus block size W for covers80 dataset.

GOHHBH_2018_v37n6_518_f0004.png 이미지

Fig. 5. Search accuracy (%) versus block size W for kpop100 dataset.

GOHHBH_2018_v37n6_518_f0005.png 이미지

Fig. 6. Search accuracy (%) versus PCA dimension for covers80 dataset with W = 75.

GOHHBH_2018_v37n6_518_f0006.png 이미지

Fig. 7. Search accuracy (%) versus PCA dimension for kpop100 dataset with W = 75.

GOHHBH_2018_v37n6_518_f0007.png 이미지

Fig. 3. (a) Chromagram of the excerpt of the original song "Between the bars". (b) Chromagram of the excerpt of the cover song "Between the bars". (c) Real part of Hi from (a) and (b) is given by solid and dashed line respectively. (d) Imaginary part of Hi from (a) and (b) is given by solid and dashed line respectively. (e) Real part of Hi from (a) and another song ("My heart will go on") is given by solid and dashed line respectively. (f) Imaginary part of Hi from (a) and another song ("My heart will go on") is given by solid and dashed line respectively. From (c) to (f), first 50 coefficients of zigzag scan of Hi are displayed (i.e. low-frequency components).

References

  1. Z. Fu, G. Lu, K. M. Ting, and D. Zhang, "A survey of audio-based music classification and annotation," IEEE Trans. Multimedia 13, 303-319 (2011). https://doi.org/10.1109/TMM.2010.2098858
  2. J. Seo, J. Kim, and J. Park, "Centroid-model based music similarity with alpha divergence" (in Korean), J. Acoust. Soc. Kr. 35, 83-91 (2016). https://doi.org/10.7776/ASK.2016.35.2.083
  3. J. Lee and H. Kim, "Audio fingerprinting using a robust hash function based on the MCLT peak-pair" (in Korean), J. Acoust. Soc. Kr. 34, 157-162 (2015). https://doi.org/10.7776/ASK.2015.34.2.157
  4. B. Logan and A. Salomon, "A music similarity function based on signal analysis," Proc. ICME-2001, 745-748 (2001).
  5. C. Charbuillet, D. Tardieu, and G. Peeters, "GMM supervector for content based music similarity," Proc. DAFX-2011, 425-428 (2011).
  6. J. Serra, E. Gomez, P. Herrera, and X. Serra, "Chroma binary similarity and local alignment applied to cover song identification," IEEE Trans. Audio Speech Lang. Process. 16, 1138-1151 (2008). https://doi.org/10.1109/TASL.2008.924595
  7. P. Foster, S. Dixon, and A. Klapuri, "Identifying cover songs using information-theoretic measures of similarity," IEEE Trans. Audio Speech Lang. Process. 23, 993-1005 (2015). https://doi.org/10.1109/TASLP.2015.2416655
  8. J. Seo, J. Kim, and J. Park, "An investigation of chroma n-gram selection for cover song search" (in Korean), J. Acoust. Soc. Kr. 36, 436-441 (2017).
  9. M. Muller and S. Ewert, "Towards timbre-invariant audio features for harmony-based music," IEEE Trans. Audio Speech Lang. Process. 18, 649-662 (2010). https://doi.org/10.1109/TASL.2010.2041394
  10. M. Muller and S. Ewert, "Chroma toolbox: MATLAB implementations for extracting variants of chroma-based audio features," Proc. ISMIR-2011, 215-220 (2011).
  11. D. Silva, C. Yeh, G. Batista, and E. Keogh, "SIMPle: Assessing music similarity using subsequences joins," Proc. ISMIR-2016, 23-29 (2016).
  12. T. Bertin-Mahieux and D. Ellis, "Large-scale cover song recognition using the 2D Fourier transform magnitude," Proc. ISMIR-2016, 241-246 (2012).
  13. J. Bello, C. Duxbury, M. Davies, and M. Sandler, "On the use of phase and energy for musical onset detection in the complex domain," IEEE Signal Process. Letters 11, 553-556 (2004). https://doi.org/10.1109/LSP.2004.827951
  14. J. Seo, J. A. Haitsma, and T. Kalker, "Linear speed-change resilient audio fingerprinting," Proc. MPCA-2002, 45-48 (2002).
  15. D. Ellis and G. Poliner, "Identifying cover songs' with chroma features and dynamic programming beat tracking," Proc. ICASSP-2007, 1429-1432 (2007).
  16. B. Reddy and B. Chatterji, "An FFT-based technique for translation, rotation, and scale-invariant image registration," IEEE Trans. Image Process. 5, 1266-1271 (1996). https://doi.org/10.1109/83.506761
  17. The covers80 cover song data set, available, https://labrosa.ee.columbia.edu/projects/coversongs/covers80/, 2007.
  18. D. Ellis and C. Cotton, "The 2007 LabROSA cover song detection system," in MIREX extended abstract 2007, (2007).