DOI QR코드

DOI QR Code

An Experimental Evaluation of Short Opinion Document Classification Using A Word Pattern Frequency

단어패턴 빈도를 이용한 단문 오피니언 문서 분류기법의 실험적 평가

  • Chang, Jae-Young (Dept. of Computer Engineering, Hansung University) ;
  • Kim, Ilmin (Dept. of Computer Engineering, Hansung University)
  • 장재영 (한성대학교 컴퓨터공학과) ;
  • 김일민 (한성대학교 컴퓨터공학과)
  • Received : 2012.08.27
  • Accepted : 2012.10.12
  • Published : 2012.10.31

Abstract

An opinion mining technique which was developed from document classification in area of data mining now becomes a common interest in domestic as well as international industries. The core of opinion mining is to decide precisely whether an opinion document is a positive or negative one. Although many related approaches have been previously proposed, a classification accuracy was not satisfiable enough to applying them in practical applications. A opinion documents written in Korean are not easy to determine a polarity automatically because they often include various and ungrammatical words in expressing subjective opinions. Proposed in this paper is a new approach of classification of opinion documents, which considers only a frequency of word patterns and excludes the grammatical factors as much as possible. In proposed method, we express a document into a bag of words and then apply a learning algorithm using a frequency of word patterns, and finally decide the polarity of the document using a score function. Additionally, we also present the experiment results for evaluating the accuracy of the proposed method.

데이터 마이닝의 문서분류 기술에서 발전된 오피니언 마이닝은 이제 국외뿐만 아니라 국내 산업에서 중요한 관심분야로 자리잡아가고 있다. 오피니언 마이닝의 핵심은 문서에서 감정 단어를 추출하여 긍정/부정 여부를 얼마나 정확하게 판별하느냐를 평가하는 것이다. 국내에서도 이에 관련된 많은 연구가 이루어 졌으나 아직 실용적으로 적용할 만큼의 분류 정확도를 보이지 않고 있다. 한국어의 경우 비문법적 표현, 감정단어의 다양성 등으로 인해 문서의 극성을 판별하기가 쉽지 않기 때문이다. 본 논문에서는 문법적 요소를 최대한 배제하고 단어패턴의 빈도만을 고려한 새로운 오피니언 문서 분류기법을 제안한다. 제안된 방법에서는 문서를 단어들의 리스트로 추상화한 후, 패턴들의 빈도를 이용하여 기계학습 알고리즘을 적용한다. 이후에 적절한 스코어 함수를 적용하여 문서의 극성을 판별한다. 또한 제안된 기법의 정확도를 평가하기 위해서 실험결과를 제시한다.

Keywords

References

  1. B. Liu , M. Hu , and J. Cheng, "Opinion observer: analyzing and comparing opinions on the Web", Proceedings of the 14th international conference on WWW, pp. 10-14, 2005.
  2. C. Scaffidi, K. Bierhoff, E. Chang, M. Felker, H. Ng, and C. Jin, "Red Opal: Product-Feature Scoring from Reviews", Proceedings of the 8th ACM conference on Electronic commerce, pp. 11-15, 2007.
  3. Xiaowen Ding, and Bing Lui, "The Utility of Linguistic Rules in Opinion Mining", SIGIR 2007, pp. 811-812, 2007.
  4. E. Courses, and T. Surveys, "Using SentiWordNet for multilingual sentiment analysis", Data Engineering Workshop ICDEW 2008, 2008.
  5. Q. Miao, Q. Li, and R. Dai, "A sentiment mining and retrieval system", Expert Systems with Applications, Vol.36, pp. 7192-7198, 2009. https://doi.org/10.1016/j.eswa.2008.09.035
  6. J. O. Kim, S. S. Lee, W, S, Yong, "Automatic Opinion Classification Of Korean Text", Journal of KIISE: Database, Vol. 38, No. 6, Dec., 2011.
  7. J. S. Myoung, D. J. Lee, S. G. Lee, "A Korean Product Review Analysis System Using a Semi-Automatically Constructed Semantic Dictionary", Journal of KIISE, Vol. 35, No. 6, 2008.
  8. H. H. Kang, S. J. Yoo, S. I, Han, "Automatic Extraction of Opinion Words from Korean Product Reviews Using the k-Structure", Journal of KIISE, Vol. 37, No. 6, 2010.
  9. J. Y. Chang, "A Sentiment Analysis Algorithm for Automatic Product Reviews Classification in On-Line Shopping Mall", Journal of Korea Society for E-Business Studies, Vol. 14, No. 4, 2009.
  10. J. Y. Chang, J. M. Kim, S, Y, Lee, "Automatic Classification of Korean Movie Reviews Using a Word Pattern Frequency", Proc. of 2012 Korea Computer Congress, 2012.
  11. S. S. Kang, Korean Morpheme Analysis and Information Retrieval, HongRung Publishing Company, 2003.
  12. C. Park, D. Seong, K. Lee, "Automatic IPC Classification for Patent Documents using Machine Learning", Journal of Korean Institute of Information Technology, Vol. 10, No. 4, 2011.
  13. J. Shim, H. C. Lee, "The Development of Automatic Ontology Generation System Using Extended Search Keywords" Journal of the Korea Academia-Industrial cooperation Society, Vol. 11, no. 6, 2009.