DOI QR코드

DOI QR Code

딜리셔스에서 유사태그 추출에 관한 연구

Mining Semantically Similar Tags from Delicious

  • Yi, Kwan (School of Library and Information Science, University of Kentucky)
  • 발행 : 2009.06.30

초록

자연언어에서 유사어의 처리는 사람과 컴퓨터간의 의사소통에 적지 않은 장애가 되어왔고, 이는 사용자의 임의적 단어사용에 기반을 두고 있는 웹 2.0 애플리케이션, 특히 소셜태깅 분야에 있어서 그 장애의 정도가 더 심각해질 수 있다. 본 연구는 한 대표적인 웹 2.0 애플리케이션에서 자동 유사어 추출에 관한 문제를 다루고 있다. 더 구체적으로, 가장 널리 사용되는 소셜북마킹 애플리케이션인 딜리셔스를 기반으로, 유사태그를 추출하는 방법(FolkSim)을 제시하고자 한다. 제시한 방법의 평가를 위하여, 문서유사도의 측정을 위해서 쓰여진 고전적 벡터모델에 의거한 유사태그를 추출하는 방법(CosSim)과 그 결과들을 서로 비교분석하여 보았다. 몇 가지 면에서 FolkSim가 더 나은 결과 산출해내는 증거들이 관찰되어졌다. 또한, FolkSim 방법에 의한 유사태그가 만들어지지 않는 경우에 대비하여, 그 대안 또한 제시하고 있다.

The synonym issue is an inherent barrier in human-computer communication, and it is more challenging in a Web 2.0 application, especially in social tagging applications. In an effort to resolve the issue, the goal of this study is to test the feasibility of a Web 2.0 application as a potential source for synonyms. This study investigates a way of identifying similar tags from a popular collaborative tagging application, Delicious. Specifically, we propose an algorithm (FolkSim) for measuring the similarity of social tags from Delicious. We compared FolkSim to a cosine-based similarity method and observed that the top-ranked tags on the similar list generated by FolkSim tend to be among the best possible similar tags in given choices. Also, the lists appear to be relatively better than the ones created by CosSim. We also observed that tag folksonomy and similar list resemble each other to a certain degree so that it possibly serves as an alternative outcome, especially in case the FolkSim-based list is unavailable or infeasible.

키워드

참고문헌

  1. Baeza-Yates, R. and B. Ribeiro-Neto. 1999. Modern Information Retrieval. New York, NY USA: ACM Press
  2. Begelman, G., P. Keller, and F. Smadja. 2006. 'Automated tag clustering: Improving search and exploration in the tag space.' Proceedings of the Tagging Workshop at the 15th International World Wide Web Conference: 22-26
  3. Chen, Hsinchun and Kevin J. Lynch. 1992. 'Automatic construction of networks of concepts characterizing document databases.' IEEE Transactions on Systems, Man and Cybernetics, 22(5): 885-902 https://doi.org/10.1109/21.179830
  4. Choy, S. O. and A. K. Lui. 2006. 'Web information retrieval in collaborative tagging systems.' Proceedings of the International Conference on Web Intelligence. Hong Kong, 18-22 December 2006: 352-355
  5. Crouch, C. J. 1990. 'An approach to the automatic construction of global thesauri.' Information Processing and Management, 26: 629-640 https://doi.org/10.1016/0306-4573(90)90106-C
  6. Dhillon, I. S. and D. S. Modha. 2001. 'Concept decompositions for large sparse text data using clustering.' Machine learning, 42(1): 143-175 https://doi.org/10.1023/A:1007612920971
  7. Furnas, G. W., T. K. Landauer, L. M. Gomez, and S. T. Dumais. 1987. 'The vocabulary problem in human-system communication.' Communications of the ACM, 30: 964-971 https://doi.org/10.1145/32206.32212
  8. Garg, Nikhil and Ingmar Weber. 2008. 'Personalized Tag Suggestion for Flickr.' Proceedings of the World Wide Web conference, Beijing, China, 21-25 April 2008: 1063-1064
  9. Golder, S. and B. A. Huberman. 2006. 'Usage patterns of collaborative tagging systems.' Journal of Information Science, 32(2): 198-208 https://doi.org/10.1177/0165551506062337
  10. Hotho, Andreas, Robert Jaistoph Schmitz, and G. Stumme. 2006. 'Information retrieval in folksonomies: Search and ranking.' Proceedings of the 3rd European Semantic Web Conference, Budva, Montenegro, 11-14 June 2006: 411-426
  11. Jannink, Jan and G. Wiederhold. 1999. 'Thesaurus entry extraction from an on-line dictionary.' Proceedings of the Second International Conference on Information Fusion. Sunnyvale CA
  12. Lin, D. 1998. 'Automatic retrieval and clustering of similar words.' Proceedings of the 17th International Conference on Computational Linguistics. Montreal, Quebec, Canada, 10-14 August 1998: 768-774
  13. Lin, Dekang, S. Zhao, L. Qin, and M. Zhou. 2003. 'Identifying synonyms among distributionally similar words.' Proceedings of International Joint Conferences on Artificial Intelligence. Acapulco, Mexico, 9-15 August 2003: 1492-1493
  14. Turney, Peter D. 2001. Mining the Web for synonyms: PMI_IR versus LSA on TOEFL. Proceedings of the 12th European Conference on Machine Learning. Freiburg, Germany, 3-7 September 2001: 491-502
  15. Turney, Peter D. 2002. 'Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews.' Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Philadelphia, PA, 6-12 July 2002: 417-424
  16. Vander Wal, T. 2007. 'Folksonomy coninage and definition.' Retrieved on 15 June 2009: http://vanderwal.net/folksonomy.html
  17. White, S. and P. Smyth. 2005. 'A spectral clustering approach to finding communities in graphs.' Proceedings of the Fifth SIAM International Conference on Data Mining. Newport Beach, CA, 21-23 April 2005: 274-285
  18. Wu, Hua and Ming Zhou. 2003. 'Optimizing synonym extraction using monolingual and bilingual resources.' Proceedings of the Second International Workshop on Paraphrasing: Paraphrase Acquisition and Applications. Sapporo, Japan, July 11, 2003: 72-79
  19. Yi, Kwan. 2008. 'Mining a Web2.0 service for the discovery of semantically similar terms: a case study with Del.icio.us.' Proceedings of the International Conference on Asia-Pacific Digital Libraries. Bali, Indonesia, 02-05 December 2008: 321-326

피인용 문헌

  1. An empirical study on the automatic resolution of semantic ambiguity in social tags vol.48, pp.1, 2011, https://doi.org/10.1002/meet.2011.14504801175
  2. A Comparative Study on Clustering Methods for Grouping Related Tags vol.43, pp.3, 2009, https://doi.org/10.4275/KSLIS.2009.43.3.399