DOI QR코드

DOI QR Code

A Comparative Analysis of Content-based Music Retrieval Systems

내용기반 음악검색 시스템의 비교 분석

  • 노정순 (한남대학교 문과대학 문헌정보학과)
  • Received : 2013.08.01
  • Accepted : 2013.09.10
  • Published : 2013.09.30

Abstract

This study compared and analyzed 15 CBMR (Content-based Music Retrieval) systems accessible on the web in terms of DB size and type, query type, access point, input and output type, and search functions, with reviewing features of music information and techniques used for transforming or transcribing of music sources, extracting and segmenting melodies, extracting and indexing features of music, and matching algorithms for CBMR systems. Application of text information retrieval techniques such as inverted indexing, N-gram indexing, Boolean search, truncation, keyword and phrase search, normalization, filtering, browsing, exact matching, similarity measure using edit distance, sorting, etc. to enhancing the CBMR; effort for increasing DB size and usability; and problems in extracting melodies, deleting stop notes in queries, and using solfege as pitch information were found as the results of analysis.

본 연구는 웹에서 접근 가능한 내용기반 음악검색(CBMR) 시스템들을 조사하여, 탐색질의의 종류, 접근점, 입출력, 탐색기능, 데이터베이스 성격과 크기 등의 관점에서 특성을 비교 분석하고자 하였다. 비교 분석에 사용된 특성을 추출하기 위해 내용기반 음악정보의 특성과 시스템 구축에 필요한 파일의 변환, 멜로디 추출 및 분할, 색인자질 추출과 색인, 매칭에 사용되는 기술들을 선행연구로 리뷰하였다. 15개의 시스템을 분석한 결과 다음과 같은 특성과 문제점이 분석되었다. 첫째, 도치색인, N-gram 색인, 불리언 탐색, 용어절단검색, 키워드 및 어구 탐색, 음길이 정규화, 필터링, 브라우징, 편집거리, 정렬과 같은 텍스트 정보 검색 기법이 CBMR에서도 검색성능을 향상시키는 도구로 사용되고 있었다. 둘째, 시스템들은 웹에서 크롤링하거나 탐색질의를 DB에 추가하는 등으로 DB의 성장과 실용성을 위한 노력을 하고 있었다. 셋째, 개선되어야 할 문제점으로 선율이나 주선율을 추출하는데 부정확성, 색인자질을 추출할 때 사용되는 불용음(stop notes)을 탐색질의에서도 자동 제거할 필요성, 옥타브를 무시한 solfege 검색의 문제점 등이 분석되었다.

Keywords

References

  1. 구경이, 임상혁, 이재헌, 김유성 (2003). 주제 선율 색인을 이용한 내용 기반 음악정보 검색 시스템. 데이터베이스 연구, 19(3), 34-45.(Ku, Kyong-I, Lim, Sang-Hyuk, Lee, Jae-Heon, & Kim, Yoo-Sung (2003). A content-based music information retrieval system using theme melody index. Database Research, 19(3), 34-45.)
  2. 김무정, 낭종호 (2011). Query By Humming 응용을 위한 Midi 파일에서의 자동 멜로디 트랙 선택 방법. 한국정보과학회 한국컴퓨 터종합학술대회 논문집, 38(1B), 405-408.)Kim, Moojung, & Nang, Jongho (2011). An automative melody track selection in MIDI files for query by humming(QBH) application, Proceedings of Conference of the Korea Information Science Society, 38(1B), 405-408.)
  3. 노정순 (2011). 정보검색: 이론과 실제. 대전: 글누리.(Ro, Jung-Soon (2011). Information retrieval: Theory and practice. Daejeon: Geulnuri.)
  4. 박만수, 김회린 (2006). 실제 잡음 환경에 강인한 오디오 핑거프린팅 기법. Telecommunications Review (SK Telecom), 16(3), 435-446.(Park, Mansoo, & Kim, Hoirin (2006). An audio fingerprinting scheme robust to real-noise environments. Telecommunications Review (SK Telecom), 16(3), 435-446.)
  5. 유진희, 박상현 (2007). 허밍 질의 처리 시스템의 성능 향상을 위한 효율적인 빈번 멜로디 인덱싱 방법. 정보과학회논문지: 데이터베이스, 34(4), 283-303.(You, Jinhee, & Park, Sanghyun (2007). An efficient frequent melody indexing method to improve performance of query-by-humming system. Journal of KISS: Database, 34(4), 283-303.)
  6. 최윤재, 박종철 (2009). 음악의 특성에 따른 피아노 솔로 음악으로부터의 멜로디 추출. 정보과학회논문지: 컴퓨팅의 실제 및 레터, 15(12), 923-927.(Choi, Yoonjae, & Park, Jong C. (2009). Extracting melodies from piano solo music based on its characteristics. Journal of KISS(C): Computing Practices and Letters, 15(12), 923-927.)
  7. Arifi, V., Clausen, M., Kurth, F., & Muller, M. (2003). Automatic synchronization of music data in score-, MIDI-, and PCM-format. Proceedings of ISMIR 2003. Retrieved from http://ismir2003.ismir.net/papers/Arifi.pdf
  8. Bainbridge, D. (2004). Music information retrieval research and its context at the University of Waikato. Journal of the American Society for Information Science and Technology, 55(12), 1092-1099. http://dx.doi.org/10.1002/asi.20062
  9. Bandera, C. de la, Barbancho, A. M., Tardon, L. J., Sammartino, S., & Barbancho, I. (2011). Humming method for content-based music information retrieval. Proceedings of ISMIR 2011, 49-54.
  10. Cano, P., Batlle, E., Kalker, T., & Haitsma, J. (2005). A review of audio fingerprinting. Journal of VLSI Signal Processing, 41, 271-284. http://dx.doi.org/10.1007/s11265-005-4151-3
  11. Cartwright, M. B., Rafii, Z., Han, J. Y., & Pardo, B. (2011). Making searchable melodies: Human versus machine. Proceedings of Human Computation, 2011. Retrieved from http://www.cs.northwestern.edu/-jha222/paper/2011_humancomp_cartwright_etal.pdf
  12. Chai, W., & Vercoe, B. (2002). Melody retrieval on the web. Proceedings of ACM/SPIE Conference on Multimedia Computing and Networking, 226. http://dx.doi.org/10.1117/12.449982
  13. Chandrasekhar, V., Sharifi, M., & Ross, D. A. (2011). Survey and evaluation of audio fingerprinting schemes for mobile query-by-example applications. Proceedings of ISMIR 2011, 801-806.
  14. Chen, R., Shen, W., Srinivasamurthy, A., & Chordia, P. (2012). Chord recognition using durationexplicit hidden Markov models. Proceedings of ISMIR 2012, 445-450.
  15. Cheng, H. T., Yang, Y. H., Lin, Y. C., Liao, I B., & Chen, H. H. (2008). Automatic chord recognition for music classification and retrieval. Proceedings of the IEEE International Conference on Multimedia and Expo 2008, 1505-1508. http://dx.doi.org/10.1109/ICME.2008.4607732
  16. Chew, E., Georgiou, P., & Narayanan, S. (2008). Challenging uncertainty in query by humming systems: A fingerprinting approach. IEEE Transactions on Audio, Speech, and Language Processing, 16(2), 359-371. http://dx.doi.org/10.1109/TASL.2007.912373
  17. Cilibrasi, R., Vitanyi, P., & Wolf, R. (2004). Algorithmic clustering of music based on string compression. Computer Music Journal, 28(4), 49-67.
  18. Dannenberg, R., Birmingham, W. P., Hu, N., Meek, C., Pardo, B., & Tzanetakis, G. (2007). A Comparative evaluation of search techniques for query by humming using the MUSART testbed. JASIST, 58(5), 587-701.
  19. David, G. (2003). Pitch extraction and fundamental frequency: history and current techniques. Technical report TR-CS/2003-06. Retrieved from http://audio-fingerprint.googlecode.com/svn-history/r62/trunk/referencias/2003-06.pdf
  20. Duggan, B., O'Shea, B., Gainza, M., & Cunningham, P. (2009). Compensating for expressiveness in queries to a content based music information retrieval system. Proceedings of the International Computer Music Conference (ICMC 2009), 33-36.
  21. Doraisamy, S., & Ruger, S. (2002). Robust polyphonic music retrieval with n-grams. Journal of Intelligent Information Systems, 21(1), 53-70. http://dx.doi.org/10.1023/A:1023553801115
  22. Downie, S. (1999). Evaluating a simple approach to music information retrieval: Conceiving melodic N-grams as text. Unpublished doctoral dissertation. Univ. of Western Ontario. USA.
  23. Ghias, A., Logan, J., Chamberlin, D., & Smith, B. (1995). Query by humming: Musical information retrieval in an audio database. Proceedings of the 3rd Annual ACM International Conference on Multimedia, 231-236.
  24. Goto, M. (2004). A real-time music-scene-description system: Predominant-F0 estimation for detecting melody and bass lines in real-world audio signals. Speech Communication, 43(4), 311-329. http://dx.doi.org/10.1016/j.specom.2004.07.001
  25. Hanna, P., Ferraro, P., & Robine, M. (2007). On optimizing the editing algorithms for evaluating similarity between monophonic musical sequences. Journal of New Music Research, 36(4), 267-279. http://dx.doi.org/10.1080/09298210801927861
  26. Hug, A., Cartwright, M., & Pardo, B. (2010). Crowdsourcing a real-world on-line query by humming system. Proceedings of the 7th Sound and Music Computing Conference, 2010, Barcelona, Spain. Retrieved from http://music.eecs.northwestern.edu/publications/smc2010-huq-cartwright-pardo.pdf
  27. Kan, M., Wang, Y., Iskandar, D., Nwe, T. L., & Shenoy, A. (2008). LyricAlly: Automatic synchronization of textual lyrics to acoustic music signals. IEEE Transaction on Audio, Speech, and Language Processing, 16(2), 338-349. http://dx.doi.org/10.1109/TASL.2007.911559
  28. Kline, R. L., & Glinert, E. P. (2003). Approximate matching algorithms for music information retrieval using vocal input. Proceedings of the Eleventh ACM International Conference on Multimedia 2003, 130-139. http://dx.doi.org/10.1145/957013.957042
  29. Kornstadt, A. (1998). Themefinder: A web-based melodic search tool. Computing in Musicology, 11, 231-236.
  30. Lee, K., & Slaney, M. (2008). Acoustic chord transcription and key extraction from audio using key-dependent HMMs trained on synthesized audio. IEEE Transactions on Audio, Speech, and Language Processing, 26(2), 291-301. http://dx.doi.org/10.1109/TASL.2007.914399
  31. Lee, Y. J., & Moon, S. B. (2006). A user study on information searching behaviors for designing user-centered query interface of content-based music information retrieval system. Journal of the Korean Society for Information Management, 23(2), 5-19. http://dx.doi.org/10.3743/KOSIM.2006.23.2.005
  32. Lemstrom, K., & Pienimaki, A. (2007). On comparing edit distance and geometric frameworks in content-based retrieval of symbolically encoded polyphonic music. Musicae Scientiae, Discussion Forum 4a, 135-152.
  33. Lemstrom, K., & Tarhio, J. (2003). Transposition invariant pattern matching for multi-track strings. Nordic Journal of Computing, 10, 185-205.
  34. McNab, R. J., Smith, L A., Witten I. H., & Cunningham, S. J. (1996). Towards the digital music library: tune retrieval from acoustic input. Proceedings of the 1st ACM International Conference on Digital Libraries, 11-18.
  35. McNab, R. J., Smith, L A., Bainbridge, D., & Witten, I. H. (1997). The New Zealand Digital Library MELody inDEX. D-Lib Magazine, 3(5), 4-15.
  36. Melucci, M., & Orio, N. (2004). Combining melody processing and information retrieval techniques: Methodology, evaluation and system implementation. Journal of the American Society for Information Science and Technology, 55(12), 1058-1066. http://dx.doi.org/10.1002/asi.20058
  37. Mesaros, A., & Virtanen, T. (2010). Automatic recognition of lyrics in singing. EURASIP Journal on Audio, Speech, and Music Processing, 2010:546047. http://dx.doi.org/10.1155/2010/546047
  38. Nam, G. P., Park, K. R., Park, S., Lee, S., & Kim, M. (2012) A new query-by-humming system based on the score level fusion of two classifiers. International Journal of Communication Systems, 25(6), 717-733. http://dx.doi.org/10.1002/dac.1187
  39. Papadopoulos, H., & Tzanetakis, G. (2012). Modeling chord and key structure with Markov logic. Proceedings of ISMIR 2012, 127-132.
  40. Pardo, B., Shiffrin, J., & Birmingham, W. (2004). Name that tune: A pilot study in finding a melody from a sung query. Journal of the American Society for Information Science and Technology, 55(4), 283-300. http://dx.doi.org/10.1002/asi.10373
  41. Prechelt, M., & Typke, R. (2001). An interface for melody input. ACM Transactions on Computer- Human Interaction, 6(2), 133-149. http://dx.doi.org/10.1145/376929.376978
  42. Rho, S., Han, B., Hwang, E., & Kim, M. (2008). MUSEMBLE: A novel music retrieval system with automatic voice query transcription and reformulation. The Journal of Systems and Software, 81(7), 1065-1080. http://dx.doi.org/10.1016/j.jss.2007.05.038
  43. Roger, B., Dannenberg, R., & Hu, N. (2004). Understanding search performance in query-by humming systems. Proceedings of ISMIR 2004, 85-89.
  44. Sheh, A., & Ellis, D. P. (2003). Chord segmentation and recognition using EM-trained hidden Markov models. Proceedings of ISMIR 2003. Retrieved from http://ismir2003.ismir.net/papers/Sheh.PDF
  45. Shiffrin, J., Pardo, B., & Birmingham, W. (2002). HMM-based musical query retrieval. Proceedings of the 2nd ACM/IEEE-CS Joint Conference on Digital Libraries, 295-300.
  46. Tripathy, A., Chhaatre, N., Surendranath, N., & Kalsi, M. (2009). Query by humming system. International Journal of Recent Trends in Engineering, 2(5), 373-379.
  47. Turetsky, R. J., & Ellis, D. P. W. (2003). Ground truth transcriptions of real music from forcealligned MIDI syntheses. Proceeding of ISMIR 2003, 445-448.
  48. Typke, R. (2007). Music retrieval based on melodic similarity. Unpublished doctoral dissertation. Universiteit Utrecht. Nederlands.
  49. Typke, R., Veltkamp, R. C., & Wiering, F. (2004). Searching notated polyphonic music using transportation distances. Proceedings of the 12th Annual ACM International Conference on Multimedia, 128-135.
  50. Typke, R., Wiering, F., & Veltkamp, R. C. (2005). A survey of music information retrieval systems. Proceedings of ISMIR 2005, 153-160.
  51. Viro, V. (2011). Peachnote: Music score search and analysis platform. Proceedings of ISMIR 2011, 359-362.
  52. Wan, C., & Liu, M. (2006). Content-based audio retrieval with relevance feedback. Pattern Recognition Letters, 27(2), 85-92. http://dx.doi.org/10.1016/j.patrec.2005.07.005
  53. Wang, A. (2003). An industrial-strength audio search algorithm. Proceedings of the 4th International Conference on Music Information Retrieval. http://www.ee.columbia.edu/-dpwe/papers/Wang03-shazam.pdf
  54. Wang, A. (2006). The Shazam music recognition service. Communications of the ACM, 49(8), 44-48. http://dx.doi.org/10.1145/1145287.1145312
  55. Wang, C., Li, J., & Shi, S. (2006). N-gram inverted index structures on music data for theme mining and content-based information retrieval. Pattern Recognition Letters, 27(5), 492-503. http://dx.doi.org/10.1016/j.patrec.2005.09.012
  56. Wold, E., Keislar, B. D., & Wheaton, J. (1996). Content-based classification, search, and retrieval of audio. IEEE Multimedia, 3(3), 27-36. https://doi.org/10.1109/93.556537
  57. Zhu, Y., & Shasha, D. (2003). Warping indexes with envelops transforms for query by humming. Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data (SIGMOD '03), 181-192. http://dx.doi.org/10.1145/872757.872780

Cited by

  1. Highlight based Lyrics Search Considering the Characteristics of Query vol.26, pp.4, 2016, https://doi.org/10.5391/JKIIS.2016.26.4.301