DOI QR코드

DOI QR Code

Comparative Study of Tokenizer Based on Learning for Sentiment Analysis

고객 감성 분석을 위한 학습 기반 토크나이저 비교 연구

  • Kim, Wonjoon (The Research Institute of AI Management Technology, Department of Industrial & Management Engineering, Sungkyul University)
  • 김원준 (성결대학교 산업경영공학과 AI경영기술연구소)
  • Received : 2020.08.06
  • Accepted : 2020.09.16
  • Published : 2020.09.30

Abstract

Purpose: The purpose of this study is to compare and analyze the tokenizer in natural language processing for customer satisfaction in sentiment analysis. Methods: In this study, a supervised learning-based tokenizer Mecab-Ko and an unsupervised learning-based tokenizer SentencePiece were used for comparison. Three algorithms: Naïve Bayes, k-Nearest Neighbor, and Decision Tree were selected to compare the performance of each tokenizer. For performance comparison, three metrics: accuracy, precision, and recall were used in the study. Results: The results of this study are as follows; Through performance evaluation and verification, it was confirmed that SentencePiece shows better classification performance than Mecab-Ko. In order to confirm the robustness of the derived results, independent t-tests were conducted on the evaluation results for the two types of the tokenizer. As a result of the study, it was confirmed that the classification performance of the SentencePiece tokenizer was high in the k-Nearest Neighbor and Decision Tree algorithms. In addition, the Decision Tree showed slightly higher accuracy among the three classification algorithms. Conclusion: The SentencePiece tokenizer can be used to classify and interpret customer sentiment based on online reviews in Korean more accurately. In addition, it seems that it is possible to give a specific meaning to a short word or a jargon, which is often used by users when evaluating products but is not defined in advance.

Keywords

References

  1. Balbi, S., Misuraca, M., and Scepi, G. 2018. Combining Different Evaluation Systems on Social Media for Measuring User Satisfaction. Information Processing & Management 54(4):674-685. https://doi.org/10.1016/j.ipm.2018.04.009
  2. Bataa, E., and Wu, J. 2019. An Investigation of Transfer Learning-based Sentiment Analysis in Japanese. arXiv Preprint arXiv:1905.09642.
  3. Berard, A., Calapodescu, I., Dymetman, M., Roux, C., Meunier, J. L., and Nikoulina, V. 2019. Machine Translation of Restaurant Reviews: New Corpus for Domain Adaptation and Robustness. arXiv Preprint arXiv:1910.14589.
  4. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., and Kuksa, P. 2011. Natural Language Processing (almost) from Scratch. Journal of Machine Llearning Research 12:2493-2537.
  5. Decker, R., and Trusov, M. 2010. Estimating Aggregate Consumer Preferences from Online Product Reviews. International Journal of Research in Marketing 27(4):293-307. https://doi.org/10.1016/j.ijresmar.2010.09.001
  6. Fang, X., and Zhan, J. 2015. Sentiment Analysis Using Product Review Data. Journal of Big Data 2(1):5. https://doi.org/10.1186/s40537-015-0015-2
  7. Gruen, T. W., Osmonbekov, T., and Czaplewski, A. J. 2006. eWOM: The Impact of Customer-to-customer Online Know-how Exchange on Customer Value and Loyalty. Journal of Business Research 59(4):449-456. https://doi.org/10.1016/j.jbusres.2005.10.004
  8. Henson, B., Barnes, C., Livesey, R., Childs, T., and Ewart, K. 2006. Affective Consumer Requirements: A Case Study of Moisturizer Packaging. Concurrent Engineering 14(3):187-196. https://doi.org/10.1177/1063293X06068358
  9. Jiang, S., and Qi, J. 2016. Cognitive Detection of Multiple Discrete Emotions from Chinese Oonline Reviews. In 2016 IEEE First International Conference on Data Science in Cyberspace (DSC) 137-142.
  10. Kaminski, B., Jakubczyk, M., and Szufel, P. 2018. A Framework for Sensitivity Analysis of Decision Trees. Central European Journal of Operations Research 26(1):135-159. https://doi.org/10.1007/s10100-017-0479-6
  11. Kim, J. Y., Kim, H. J., & Kim, C. M. (2009). The Influence of Service Elements on Customers' Emotion and Loyalty-Focused on Specialty Coffee Shop Customers. Culinary Science and Hospitality Research, 15(1), 271-286. https://doi.org/10.20878/cshr.2009.15.3.021021021
  12. Kudo, T., and Richardson, J. 2018. Sentencepiece: A Simple and Language Independent Subword Tokenizer and Detokenizer for Neural Text Processing. arXiv Preprint arXiv:1808.06226.
  13. Kuwano, S., Namba, S., Takehira, O., and Fastl, H. 2009. Subjective Impression of Copy Machine Noises: An Examination of Physical Metrics for the Evaluation of Sound quality. In Proc. Inter-Noise 2009 Ottawa, Canada.
  14. Lim, J. S., and Kim, J. M. 2014. An Empirical Comparison of Machine Learning Models for Classifying Emotions in Korean Twitter. Journal of Korea Multimedia Society 17(2):232-239. https://doi.org/10.9717/kmms.2014.17.2.232
  15. Litvin, S. W., Goldsmith, R. E., and Pan, B. 2008. Electronic Word-of-mouth in Hospitality and Tourism Management. Tourism Management 29(3):458-468. https://doi.org/10.1016/j.tourman.2007.05.011
  16. Liu, B. 2012. Sentiment Analysis and Opinion Mining. Synthesis Lectures on Human Language Technologies 5(1):1-167. https://doi.org/10.2200/S00416ED1V01Y201204HLT016
  17. Liu, H., He, J., Wang, T., Song, W., and Du, X. 2013. Combining User Preferences and User Opinions for Accurate Recommendation. Electronic Commerce Research and Applications 12(1):14-23. https://doi.org/10.1016/j.elerap.2012.05.002
  18. Liu, Y., Jin, J., Ji, P., Harding, J. A., and Fung, R. Y. 2013. Identifying Helpful Online Reviews: a Product Designer's Perspective. Computer-Aided Design 45(2):180-194. https://doi.org/10.1016/j.cad.2012.07.008
  19. Montefinese, M., Ambrosini, E., Fairfield, B., and Mammarella, N. 2014. The Adaptation of the Affective Norms for English Words (ANEW) for Italian. Behavior Research Methods 46(3):887-903. https://doi.org/10.3758/s13428-013-0405-3
  20. Rose, S., Hair, N., and Clark, M. 2011. Online Customer Experience: A Review of the Bbusiness-to-consumer Oonline Purchase Context. International Journal of Management Reviews 13(1):24-39. https://doi.org/10.1111/j.1468-2370.2010.00280.x
  21. Su, J., Yu, S., and Luo, D. 2020. Enhancing Aspect-Based Sentiment Analysis With Capsule Network. IEEE ACCESS 8:100551-100561. https://doi.org/10.1109/ACCESS.2020.2997675
  22. Taniguchi, Y., Konomi, S. I., and Goda, Y. 2019. Examining Language-agnostic Methods of Automatic Coding in the Community of Inquiry Framework. In 16th International Conference on Cognition and Exploratory Learning in Digital Age IADIS Press 19-26.
  23. Wang, H., Lu, Y., and Zhai, C. 2010. Latent Aspect Rating Analysis on Review Text Data: a Rating Regression Approach. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 783-792.
  24. Yang, L., Li, Y., Wang, J., and Sherratt, R. S. 2020. Sentiment Analysis for E-commerce Product Reviews in Chinese Based on Sentiment Lexicon and Deep Learning. IEEE Access 8:23522-23530. https://doi.org/10.1109/ACCESS.2020.2969854