DOI QR코드

DOI QR Code

Impact of Word Embedding Methods on Performance of Sentiment Analysis with Machine Learning Techniques

  • Park, Hoyeon (Dept. of MIS, Graduate School, Dongguk University) ;
  • Kim, Kyoung-jae (Dept. of MIS, Business School, Dongguk University)
  • Received : 2020.07.28
  • Accepted : 2020.08.11
  • Published : 2020.08.31

Abstract

In this study, we propose a comparative study to confirm the impact of various word embedding techniques on the performance of sentiment analysis. Sentiment analysis is one of opinion mining techniques to identify and extract subjective information from text using natural language processing and can be used to classify the sentiment of product reviews or comments. Since sentiment can be classified as either positive or negative, it can be considered one of the general classification problems. For sentiment analysis, the text must be converted into a language that can be recognized by a computer. Therefore, text such as a word or document is transformed into a vector in natural language processing called word embedding. Various techniques, such as Bag of Words, TF-IDF, and Word2Vec are used as word embedding techniques. Until now, there have not been many studies on word embedding techniques suitable for emotional analysis. In this study, among various word embedding techniques, Bag of Words, TF-IDF, and Word2Vec are used to compare and analyze the performance of movie review sentiment analysis. The research data set for this study is the IMDB data set, which is widely used in text mining. As a result, it was found that the performance of TF-IDF and Bag of Words was superior to that of Word2Vec and TF-IDF performed better than Bag of Words, but the difference was not very significant.

본 연구에서는 다양한 워드 임베딩 기법이 감성분석의 성과에 미치는 영향을 확인하기 위한 비교연구를 제안한다. 감성분석은 자연어 처리를 사용하여 텍스트 문서에서 주관적인 정보를 식별하고 추출하는 오피니언 마이닝 기법 중 하나이며, 상품평이나 댓글의 감성을 분류하는데 사용될 수 있다. 감성은 긍정적이거나 부정적인 것으로 분류될 수 있기 때문에 일반적인 분류문제 중 하나로 생각할 수 있으며, 이의 분류를 위해서는 텍스트를 컴퓨터가 인식할 수 있는 언어로 변환하여야 한다. 따라서 단어나 문서와 같은 텍스트를 자연어 처리에서 벡터로 변형하여 진행하는데 이를 워드 임베딩이라고 한다. 워드 임베딩 기법은 Bag of Words, TF-IDF, Word2Vec 등 다양한 기법이 사용되고 있는데 지금까지 감성분석에 적합한 워드 임베딩 기법에 대한 연구는 많이 진행되지 않았다. 본 연구에서는 영화 리뷰의 감성분석을 위해 다양한 워드 임베딩 기법 중 Bag of Words, TF-IDF, Word2Vec을 사용하여 그 성과를 비교 분석한다. 분석에 사용할 연구용 데이터 셋은 텍스트 마이닝에서 많이 활용되고 있는 IMDB 데이터 셋을 사용하였다. 분석 결과, TF-IDF와 Bag of Words의 성과가 Word2Vec보다 우수한 것으로 나타났으며 TF-IDF는 Bag of Words보다 성과가 우수하였으나 그 차이가 매우 크지는 않았다.

Keywords

References

  1. T. A. Rana and Y.-N. Cheah, "Aspect extraction in sentiment analysis: comparative analysis and survey," Artificial Intelligence Review, vol. 46, no. 4, pp. 459-483, Feb. 2016. https://doi.org/10.1007/s10462-016-9472-z
  2. Q. T. Ain, M. Ali, A. Riaz, A. Noureen, M. Kamran, B. Hayat, and A. Rehman, "Sentiment analysis using deep learning techniques: a review," International Journal of Advanced Computer Science and Applications, vol. 8, no. 6, pp. 424-433, Jun. 2017.
  3. A. Abdi, S. M. Shamsuddin, S. Hasan, and J. Piran, "Deep learning-based sentiment classification of evaluative text based on Multi-feature fusion," Information Processing & Management, vol. 56, no. 4, pp. 1245-1259, Jul. 2019. https://doi.org/10.1016/j.ipm.2019.02.018
  4. B. Pang, L. Lee, and S. Vaithyanathan, "Thumbs up? Sentiment classification using machine learning techniques." in Proc. of EMNLP 2002, pp. 79-86, Jul. 2002.
  5. F. H. Khan, U. Qamar, and S. Bashir, "SentiMI: Introducing point-wise mutual information with SentiWordNet to improve sentiment polarity detection," Applied Soft Computing, vol. 39, pp. 140-153, Feb. 2016. https://doi.org/10.1016/j.asoc.2015.11.016
  6. F. Tang, L. Fu, B. Yao, and W. Xu, "Aspect based fine-grained sentiment analysis for online reviews," Information Sciences, vol. 488, pp. 190-204, Jul. 2019. https://doi.org/10.1016/j.ins.2019.02.064
  7. C. Bhadane, H. Dalal, and H. Doshi, "Sentiment analysis: Measuring opinions," Procedia Computer Science, vol. 45, no. 0, pp. 808-814, Mar. 2015. https://doi.org/10.1016/j.procs.2015.03.159
  8. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality," in Advances in Neural Information Processing Systems, pp. 3111-3119, 2013.
  9. W. J. Kim, D. H. Kim and H. W. Jang, "Semantic extension search for documents using the Word2vec," Journal of the Korea Contents Association, vol. 16, no. 10, pp. 687-692, Oct. 2016. https://doi.org/10.5392/JKCA.2016.16.10.687
  10. D. K. Sung, and Y. S. Jeong, "Political opinion mining from article comments using deep learning," Journal of The Korea Society of Computer and Information, vol. 23, no. 1, pp. 9-15, Jan. 2018. https://doi.org/10.9708/JKSCI.2018.23.01.009
  11. T. Lee, K. Kim, J. Lee, and S. Lee, "An efficient BotNet detection scheme exploiting Word2Vec and accelerated hierarchical density-based clustering," Journal of Internet Computing and Services, vol. 20, no. 6, pp. 11-20, Dec. 2019. https://doi.org/10.7472/jksii.2019.20.6.11
  12. E. H. Kim, "A deeping learning-based article and paragraph-level classification," Journal of the Korea Society of Computer and Information, vol. 23, no. 11, pp. 31-41, Nov. 2018. https://doi.org/10.9708/JKSCI.2018.23.11.031
  13. J. Park, H. Kim, H. G. Kim, T. K. Ahn, and H. Yi, "Structuring of unstructured SNS messages on rail services using deep learning techniques," Journal of The Korea Society of Computer and Information, vol. 23, no. 7, pp. 19-26, Jul. 2018. https://doi.org/10.9708/JKSCI.2018.23.07.019
  14. S. M. Liu and J.-H. Chen, "A multi-label classification based approach for sentiment classification," Expert Systems with Applications, vol. 42, no. 3, pp. 1083-1093, Feb. 2015. https://doi.org/10.1016/j.eswa.2014.08.036
  15. G. Gautam and D. Yadav, "Sentiment analysis of twitter data using machine learning approaches and semantic analysis," in Proc. of IC3, IEEE, pp. 437-442, Aug. 2014.
  16. J. Read, "Using emoticons to reduce dependency in machine learning techniques for sentiment classification," in Proceedings of the ACL Student Research Workshop, pp. 43-48, Jun. 2005.
  17. L. Dey, S. Chakraborty, A. Biswas, B. Bose, and S. Tiwari, "Sentiment analysis of review datasets using Naive Bayes and k-nn classifier," International Journal of Information Engineering and Electronic Business, vol. 8, no. 4, pp. 54-62, Jul. 2016. https://doi.org/10.5815/ijieeb.2016.04.07