DOI QR코드

DOI QR Code

Sentiment Analysis System Using Stanford Sentiment Treebank

스탠포드 감성 트리 말뭉치를 이용한 감성 분류 시스템

  • Lee, Songwook (Department of Computer Science and Information Engineering, Korea National University of Transportation)
  • Received : 2014.12.09
  • Accepted : 2015.02.09
  • Published : 2015.03.31

Abstract

The main goal of this research is to build a sentiment analysis system which automatically determines user opinions of the Stanford Sentiment Treebank in terms of three sentiments such as positive, negative, and neutral. Firstly, sentiment sentences are POS tagged and parsed to dependency structures. All nodes of the Treebank and their polarities are automatically extracted from the Treebank. We train two Support Vector Machines models. One is for a node level classification and the other is for a sentence level. We have tried various type of features such as word lexicons, POS tags, Sentiment lexicons, head-modifier relations, and sibling relations. Though we acquired 74.2% in accuracy on the test set for 3 class node level classification and 67.0% for 3 class sentence level classification, our experimental results for 2 class classification are comparable to those of the state of art system using the same corpus.

본 연구는 스탠포드 감성 트리 말뭉치를 이용하여 감성 분류 시스템을 구현하였으며, 분류기로는 지지벡터기계(Support Vector Machines)를 이용하여 긍정, 중립, 부정 등의 3가지 감성으로 분류하였다. 먼저 감성 문장의 품사를 부착한 후 의존구조를 부착하였다. 트리 말뭉치의 모든 노드와 감성 태그를 자동으로 추출하여 문장 레벨의 지지벡터 분류 시스템과 노드 레벨의 지지벡터 분류 시스템을 각각 구현하였다. 자질로는 어휘, 품사, 감성어휘, 의존관계, 형제관계 등 다양한 자질의 조합을 이용하였다. 평가 말뭉치를 이용하여 3클래스로 분류한 결과, 노드 단위에서는 74.2%, 문장 단위에서는 67.0%의 정확도를 얻었으나 2클래스 분류에서는 현재 알려진 최고의 시스템에 어느 정도 필적하는 성능을 거두었다.

Keywords

References

  1. K. J. Lee, "Compositional rules of Korean auxiliary predicates for sentiment analysis," Journal of the Korean Society of Marine Engineering, vol. 37, no. 3, pp. 291-299, 2013. https://doi.org/10.5916/jkosme.2013.37.3.291
  2. B. Liu, M. Hu, and J. Cheng, "Opinion observer : Analyzing and comparing opinions on the web," Proceedings of the 14th international World Wide Web conference, pp. 342-451, 2005.
  3. A. M. Popescu and O. Etzioni, "Extracting product features and opinions from reviews," Proceedings of Conference on Empirical Methods on Natural Language Processing, pp. 339-346, 2005.
  4. P. D. Turney, "Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews," Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL'02), pp. 417-424, 2002.
  5. R. Socher, A. Perelygin, J. Y. Wu, J. Chuang, C. D. Manning, A. Y. Ng, and C. Potts Potts, "Recursive deep models for semantic compositionality over a sentiment treebank," Proceedings of Conference on Empirical Methods on Natural Language Processing, 2013.
  6. J. Bollen, H. Mao, and X. J. Zeng, "Twitter mood predicts the stock market," Journal of Computational Science, vol. 2, no. 1, pp. 1-8, 2011. https://doi.org/10.1016/j.jocs.2010.12.007
  7. S. Asur and B. A. Huberman, "Predicting the future with social media," Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, vol. 1, pp. 492-499, 2010.
  8. A. Pak and P. Paroubek, "Twitter as a corpus for sentiment analysis and opinion mining," Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC'10), pp. 1320-1326, 2010.
  9. A. Go, R. Bhayani, and L. Huang, "Twitter sentiment classification using distant supervision," Technical report CS224N, Stanford University, 2009.
  10. B. O'Connor, R. Balasubramanyan, B. R. Routledge, and N. A. Smith, "From tweets to polls : Linking text sentiment to public opinion time series," Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media, pp. 122-129, 2010.
  11. C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, and D. McClosky, "The stanford coreNLP natural language processing toolkit," Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics : System Demonstrations, pp. 55-60, 2014.
  12. M. Ganapathibhotla and B. Liu, "Mining opinions in comparative sentences," Proceedings of the 22nd International Conference on Computational Linguistics, pp. 18-22, 2008.
  13. C. C. Chang and C. J. Lin, "LIBSVM : a library for support vector machines," ACM Transactions on Intelligent Systems and Technology, vol. 2, no. 3, pp. 27:1-27:27, 2011.

Cited by

  1. A corpus-based approach to classifying emotions using Korean linguistic features vol.20, pp.1, 2017, https://doi.org/10.1007/s10586-017-0777-8
  2. 감성 분석 및 감성 정보 부착 시스템 구현 vol.5, pp.8, 2015, https://doi.org/10.3745/ktsde.2016.5.8.377