DOI QR코드

DOI QR Code

Sentiment Prediction using Emotion and Context Information in Unstructured Documents

비정형 문서에서 감정과 상황 정보를 이용한 감성 예측

  • 김진수 (안양대학교 아리교양대학)
  • Received : 2020.08.30
  • Accepted : 2020.10.20
  • Published : 2020.10.28

Abstract

With the development of the Internet, users share their experiences and opinions. Since related keywords are used witho0ut considering information such as the general emotion or genre of an unstructured document such as a movie review, the sensitivity accuracy according to the appropriate emotional situation is impaired. Therefore, we propose a system that predicts emotions based on information such as the genre to which the unstructured document created by users belongs or overall emotions. First, representative keyword related to emotion sets such as Joy, Anger, Fear, and Sadness are extracted from the unstructured document, and the normalized weights of the emotional feature words and information of the unstructured document are trained in a system that combines CNN and LSTM as a training set. Finally, by testing the refined words extracted through movie information, morpheme analyzer and n-gram, emoticons, and emojis, it was shown that the accuracy of emotion prediction using emotions and F-measure were improved. The proposed prediction system can predict sentiment appropriately according to the situation by avoiding the error of judging negative due to the use of sad words in sad movies and scary words in horror movies.

인터넷의 발전으로 사용자들은 자신의 경험이나 의견을 공유한다. 영화평과 같은 비정형 문서의 전체적인 감정이나 장르 등의 정보를 고려하지 않고 연관된 키워드를 사용하기 때문에 적절한 감정 상황에 따른 감성 정확도를 저해한다. 따라서 사용자들이 작성한 비정형 문서가 속한 장르나 전반적인 감정 등의 정보를 기반으로 감성을 예측하는 시스템을 제안한다. 먼저, 비정형 문서로부터 기쁨, 화남, 공포, 슬픔 등의 감정 집합과 연관된 대표 키워드를 추출하고, 감정 특징단어들의 정규화된 가중치와 비정형 문서의 정보를 훈련 집합으로 CNN과 LSTM을 조합한 시스템에 훈련한다. 최종적으로 영화 정보와 형태소 분석기와 n-gram을 통해 추출한 정제된 단어들과 이모티콘, 이모지 등을 테스트함으로써 감정을 이용한 감성 예측 정확도와 F-measure 측면에서 향상됨을 보였다. 제안한 예측시스템은 슬픈 영화에서 슬픈 단어의 사용과 공포 영화에서 무서운 단어 등의 사용으로 인해 부정으로 판단하는 오류를 피함으로써, 감성을 상황에 따라 적절하게 예측할 수 있다.

Keywords

References

  1. S. D. Kim, E. B. Park, S. J. Lee & K. Y. Kim. (2010). A Syllable Kernel based Sentiment Classification for Movie Reviews. Journal of Korean Institute of Intelligent Systems, 20(2), 202-207. DOI : 10.5391/JKIIS.2010.20.2.202
  2. K. Y. Kim & C. S. Kim. (2009). A String Kernel based Sentiment Classification for Blog Text. Proceedings of KIIS Fall Conference 2009, 19(2), 199-201. DOI : 10.5391/JKIIS.2012.22.5.563
  3. S. Seo & J. Kim, (2016). Sentiment Analysis Research Trend Based on Deep Learning. The Korea Multimedia Society, 20(3), 8-22.
  4. A. Rexha, M. Kröll, M. Dragoni & R. Kern. (2016). Polarity Classification for Target Phrases in Tweets: A Word2Vec Approach. ESWC 2016. LNCS, 9989, 217-223. DOI : 10.1007/978-3-319-47602-5_40
  5. M. Kang, J. Ahn & K. Lee. (2018). Opinion mining using ensemble text hidden Markov models for text classification. Expert Systems with Applications, 94, 218-227. DOI : 10.1016/j.eswa.2017.07.019
  6. R. Thayer. (1989). The Biopsychology of Mood and Arousal. Oxford University Press.
  7. K. R. Scherer & P. Ekman. (2014). Approaches to Emotion. Psychology Press, New York.
  8. M. Chang. (2012). Empirical Sentiment Classification Using Psychological Emotions and Social Web Data. Journal of Korean Institute of Intelligent Systems, 22(5), 563-569. DOI : 10.5391/JKIIS.2012.22.5.563
  9. Y. Kim. (2014). Convolutional neural networks for sentence classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1746-1751. DOI : 10.3115/v1/D14-1181
  10. F. Abid, M. Alam, M. Yasir & C. Li. (2019). Sentiment analysis through recurrent variants latterly on convolutional neural network of Twitter. Future Generation Computer Systems, (95), 292-308. DOI : 10.1016/j.future.2018.12.018
  11. E. Park & S. Cho. (2014). KoNLPy: Korean natural language processing in Python. Proceedings of the 26th Annual Conference on Human & Cognitive Language Technology, Chuncheon, Korea. 133-136.
  12. J. Kim. (2014). Emotion Prediction of Document using Paragraph Analysis. Journal of Digital Convergence, 12(12), 249-255. DOI : 10.14400/JDC.2014.12.12.249
  13. Unicode Emoji. http://www.unicode.org/reports/tr51