DOI QR코드

DOI QR Code

E-commerce data based Sentiment Analysis Model Implementation using Natural Language Processing Model

자연어처리 모델을 이용한 이커머스 데이터 기반 감성 분석 모델 구축

  • Choi, Jun-Young (Graduates School of Computer & Information Technology, Korea University) ;
  • Lim, Heui-Seok (Department of Computer Science and Engineering, Korea University)
  • 최준영 (고려대학교 컴퓨터정보통신대학원) ;
  • 임희석 (고려대학교 컴퓨터학과)
  • Received : 2020.09.10
  • Accepted : 2020.11.20
  • Published : 2020.11.28

Abstract

In the field of Natural Language Processing, Various research such as Translation, POS Tagging, Q&A, and Sentiment Analysis are globally being carried out. Sentiment Analysis shows high classification performance for English single-domain datasets by pretrained sentence embedding models. In this thesis, the classification performance is compared by Korean E-commerce online dataset with various domain attributes and 6 Neural-Net models are built as BOW (Bag Of Word), LSTM[1], Attention, CNN[2], ELMo[3], and BERT(KoBERT)[4]. It has been confirmed that the performance of pretrained sentence embedding models are higher than word embedding models. In addition, practical Neural-Net model composition is proposed after comparing classification performance on dataset with 17 categories. Furthermore, the way of compressing sentence embedding model is mentioned as future work, considering inference time against model capacity on real-time service.

자연어 처리 분야에서 번역, 형태소 태깅, 질의응답, 감성 분석등 다양한 영역의 연구가 활발히 진행되고 있다. 감성 분석 분야는 Pretrained Model을 전이 학습하여 단일 도메인 영어 데이터셋에 대해 높은 분류 정확도를 보여주고 있다. 본 연구에서는 다양한 도메인 속성을 가지고 있는 이커머스 한글 상품평 데이터를 이용하고 단어 빈도 기반의 BOW(Bag Of Word), LSTM[1], Attention, CNN[2], ELMo[3], KoBERT[4] 모델을 구현하여 분류 성능을 비교하였다. 같은 단어를 동일하게 임베딩하는 모델에 비해 문맥에 따라 다르게 임베딩하는 전이학습 모델이 높은 정확도를 낸다는 것을 확인하였고, 17개 카테고리 별, 모델 성능 결과를 분석하여 실제 이커머스 산업에서 적용할 수 있는 감성 분석 모델 구성을 제안한다. 그리고 모델별 용량에 따른 추론 속도를 비교하여 실시간 서비스가 가능할 수 있는 모델 연구 방향을 제시한다.

Keywords

References

  1. Hochreiter & Schmidhuber. (1997). LONG SHORT-TERM MEMORY. Neural Computation, DOI: 10.1162/neco.1997.9.8.1735
  2. Lecun. (1998). Gradient-Based Learning Applied to Document Recognition. IEEE, 86(11), 2278-2324. DOI:10.1109/5.726791
  3. M. Peters. (2018). ELMo-Deep contextualized word representations. NAACL 2018. https://arxiv.org/abs/1802.05365
  4. SKTBrain, KoBERT. (2019). https://github.com/SKTBrain/KoBERT
  5. J. Devlin, K. Lee & K. Toutanova. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. http://arxiv.org/abs/1810.04805
  6. H. M. Kim & K. B. Park. (2019). Sentiment analysis of online food product review using ensemble technique. Journal of Digital Convergence, 17(4), 115-122. DOI: 10.14400/JDC.2019.17.4.11
  7. H. Y. Park & K. J. Kim. (2019). Sentiment Analysis of Movie Review Using Integrated CNN-LSTM Model. Journal of Intelligence and Information Systems, 25(4), 141-154. DOI : 10.13088/jiis.2019.25.4.141
  8. D. Bahdanau, K. H. Cho & Y. Bengio (2014) Neural Machine Translation by Jointly Learning to Align and Translate. ICLR 2015. https://arxiv.org/abs/1409.0473
  9. Y. Kim. (2014) Convolutional Neural Networks for Sentence Classification. EMNLP 2014. https://arxiv.org/abs/1408.5882
  10. Vaswani et al. (2017). Attention is all you need. https://arxiv.org/abs/1706.03762
  11. T. Kudo & J. Richardson. (2018). SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing. EMNLP2018, 66-71 https://arxiv.org/abs/1808.06226
  12. A. Joulin. (2016). FastText.zip: Compressing text classification models. ICLR 2017. https://arxiv.org/abs/1612.03651
  13. A. F. Agarap. (2018). Deep Learning using Rectified Linear Units (ReLU), 1, 2-8. https://arxiv.org/abs/1803.08375
  14. D. P. Kingma & J. Ba. (2014). Adam: A Method for Stochastic Optimization. 1-15. https://doi.org/http://doi.acm.org.ezproxy.lib.ucf.edu/10.1145/1830483.1830503
  15. D. Hendrycks & K. Gimpel. (2016). Gaussian Error Linear Units (GELUs). https://arxiv.org/abs/1606.08415
  16. M. A. Gordon - All The Ways to Compress http://mitchgordon.me/machine/learning/2019/11/18/all-the-ways-to-compress-BERT.html