DOI QR코드

DOI QR Code

A Study on the Characteristics of Opinion Retrieval Using Term Statistical Analysis in Opinion Documents

의견 문서의 단어 통계 분석을 통한 의견 검색 특성에 관한 연구

  • Received : 2010.07.19
  • Accepted : 2010.09.09
  • Published : 2010.11.30

Abstract

Opinion retrieval which searches the opinions expressed in documents by users cannot outperform significantly yet traditional topical retrieval which searches the facts. Therefore, the focus of this paper is to identify the statistical characteristics which can be applied to opinion retrieval by comparing and analyzing the term statistics of opinion and non-opinion documents in the blog domain. The TREC Blogs06 collection and 150 TREC topics are used in the experiments. The difference between term probability distributions in opinion documents is measured by JS divergence, and the difference according to the topic types and topic domains is also investigated. Moreover, the term probabilities of opinion terms are analyzed comparatively. The main findings of this study include the following: it is necessary to consider the topic-specific characteristics for the opinion detection; it is effective to extract positive and negative opinion terms according to the topics; the topic types are complementary to the topic domains; and special attention has to be given to the usage of the positive opinion terms.

문서에 표출된 사용자의 의견을 검색하는 의견 검색의 성능이 일반 사실을 검색하는 기존 주제 검색의 성능을 크게 향상시키지 못하고 있다. 이에 본 연구는 블로그를 대상으로 의견 문서와 비의견 문서의 단어 통계를 비교 분석함으로써 의견 검색에 활용할 수 있는 통계적 특성을 파악하고자 한다. TREC의 블로그 트랙에서 사용했던 Blogs06 컬렉션과 150개의 TREC 토픽을 실험 데이터로 사용하였다. JS divergence를 이용하여 의견 문서에서의 단어 확률 분포 간의 상이성을 비교 분석하였으며, TREC 토픽의 유형 및 주제 영역별로 의견 문서를 구분하여 확률 분포의 차이점을 살펴보았고, 의견 단어별 확률을 비교 분석하였다. 실험을 통해 토픽별 특성을 고려한 의견 탐지 방법의 필요성, 토픽별 긍/부정 의견 단어 추출의 효과성, 유형과 주제 영역의 상호 보완적인 특징, 긍정 의견 단어 사용의 유의점 등을 알아내었다.

Keywords

References

  1. Iadh Ounis, Craig Macdonald, and Ian Soboroff, "Overview of the TREC-2008 Blog Track," Proceedings of the 17th Text Retrieval Conference (TREC-2008), Gaithersburg, Maryland, USA, Nov. 2008.
  2. Craig Macdonald, Iadh Ounis, and Ian Soboroff, "Overview of the TREC-2009 Blog Track," Proceedings of the 18th Text Retrieval Conference (TREC-2009), Gaithersburg, Maryland, USA, Nov. 2009.
  3. 신현일, 유은일, 류근호, "주제어가중치기법에의한효율적인 블로그 검색 시스템," 한국컴퓨터정보학회논문지, 제 15권, 제 4호, 1-9쪽, 2010년 4월. https://doi.org/10.9708/jksci.2010.15.4.001
  4. Kiduk Yang, Ning Yu, Alejandro Valerio, Hui Zhang, and Weimao Ke, "Fusion Approach to Finding Opinions in Blogosphere," Proceedings of the 1st International Conference on Weblogs and Social Media(ICWSM-2007), Boulder, Colorado, USA, Mar. 2007.
  5. Olga Vechtomova, "Using Subjective Adjectives in Opinion Retrieval from Blogs," Proceedings of the 16th Text Retrieval Conference (TREC-2007), Gaithersburg, Maryland, USA, Nov. 2007.
  6. Soo-Min Kim and Eduard Hovy, "Automatic Detection of Opinion Bearing Words and Sentences," Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP-2005), pp. 61-66, Jeju Island, Korea, Oct. 2005.
  7. Ethan Zhang and Yi Zhang, "UCSC on TREC 2006 Blog Opinion Mining," Proceedings of the 15th Text Retrieval Conference (TREC-2006), Gaithersburg, Maryland, USA, Nov. 2006.
  8. Ben He, Craig Macdonald, Jiyin He, and Ladh Ounis, "An Effective Statistical Approach to Blog Post Opinion Retrieval," Proceeding of the 17th ACM Conference on Information and Knowledge Management (CIKM-2008), pp. 1063-1072, California, USA, Oct. 2008.
  9. 이승욱, 송영인, 임해창, "혼합 방식에 기반한 의견 문서 검색 시스템," 정보관리학회지, 제 25권, 제 4호, 115-129 쪽, 2008년 12월. https://doi.org/10.3743/KOSIM.2008.25.4.115
  10. 남상협, 나승훈, 이예하, 이용훈, 김준기, 이종혁, "의견 어 구추출을위한생성모델과분류모델을결합한부분지도 학습 방법," 한국정보과학회 2008 종합학술대회 논문집, 제 35권, 제 1호(C), 268-273쪽, 2008년 6월.
  11. 주해종, 홍봉화, 정복철, "의견정보 모니터링을 위한 웹 마 이닝 시스템에 관한 연구," 한국컴퓨터정보학회논문지, 제 15권, 제 1호, 149-157쪽, 2010년 1월. https://doi.org/10.9708/jksci.2010.15.1.149
  12. Lifeng Jia, Clement Yu, and Wei Zhang, "UIC at TREC 2008 Blog Track," Proceedings of the 17th Text Retrieval Conference (TREC-2008), Gaithersburg, Maryland, USA, Nov. 2008.
  13. GuangXu Zhou, Hemant Joshi, and Coskun Bayrak, "Topic Categorization for Relevancy and Opinion Detection," Proceedings of the 16th Text Retrieval Conference (TREC-2007), Gaithersburg, Maryland, USA, Nov. 2007.
  14. 윤홍준, 김한준, "오피니언 마이닝 기술을 이용한 효율적 상품평 검색 기법," 정보과학회논문지: 컴퓨팅의 실제 및 레터, 제 16권, 제 2호, 222-226쪽, 2010년 2월.
  15. Min Zhang and Xingyao Ye, "A Generation Model to Unify Topic Relevance and Lexicon-based Sentiment for Opinion Retrieval," Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR-2008), pp .411-418, Singapore, Jul. 2008.
  16. Robert Krovetz, "Viewing Morphology as an Inference Process," Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-1993), pp. 191-202, Pittsburgh, USA, Jun. 1993.
  17. Thomas M. Cover and Joy A. Thomas, "Elements of Information Theory," Wiley-Interscience, New York, 1991.
  18. Lillian Jane Lee, "Similarity-Based Approaches to Natural Language Processing," Phd Thesis, The Division of Engineering and Applied Sciences, Harvard University, May 1997.
  19. Craig Macdonald and Ladh Ounis, "The TREC Blog06 Collection: Creating and Analysing a Blog Test Collection," DCS Technical Report TR-2006-224, University of Glasgow, 2006.
  20. The Blogs08 Test Collection, http://ir.dcs.gla.ac.uk/ test_collections/blogs08info.html.
  21. TREC 2008 Blog Track, http://trec.nist.gov/data/blog08.html.