E-commerce data based Sentiment Analysis Model Implementation using Natural Language Processing Model

Choi, Jun-Young;Lim, Heui-Seok;

doi:10.15207/JKCS.2020.11.11.033

Journal of the Korea Convergence Society (한국융합학회논문지)

Volume 11 Issue 11
/
Pages.33-39
/
2020
/
2233-4890(pISSN)
/
2713-6353(eISSN)

Korea Convergence Society (한국융합학회)

DOI QR Code

E-commerce data based Sentiment Analysis Model Implementation using Natural Language Processing Model

자연어처리 모델을 이용한 이커머스 데이터 기반 감성 분석 모델 구축

Choi, Jun-Young (Graduates School of Computer & Information Technology, Korea University) ;
Lim, Heui-Seok (Department of Computer Science and Engineering, Korea University)

최준영 (고려대학교 컴퓨터정보통신대학원) ;
임희석 (고려대학교 컴퓨터학과)

Received : 2020.09.10
Accepted : 2020.11.20
Published : 2020.11.28

https://doi.org/10.15207/JKCS.2020.11.11.033 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

In the field of Natural Language Processing, Various research such as Translation, POS Tagging, Q&A, and Sentiment Analysis are globally being carried out. Sentiment Analysis shows high classification performance for English single-domain datasets by pretrained sentence embedding models. In this thesis, the classification performance is compared by Korean E-commerce online dataset with various domain attributes and 6 Neural-Net models are built as BOW (Bag Of Word), LSTM[1], Attention, CNN[2], ELMo[3], and BERT(KoBERT)[4]. It has been confirmed that the performance of pretrained sentence embedding models are higher than word embedding models. In addition, practical Neural-Net model composition is proposed after comparing classification performance on dataset with 17 categories. Furthermore, the way of compressing sentence embedding model is mentioned as future work, considering inference time against model capacity on real-time service.

자연어 처리 분야에서 번역, 형태소 태깅, 질의응답, 감성 분석등 다양한 영역의 연구가 활발히 진행되고 있다. 감성 분석 분야는 Pretrained Model을 전이 학습하여 단일 도메인 영어 데이터셋에 대해 높은 분류 정확도를 보여주고 있다. 본 연구에서는 다양한 도메인 속성을 가지고 있는 이커머스 한글 상품평 데이터를 이용하고 단어 빈도 기반의 BOW(Bag Of Word), LSTM[1], Attention, CNN[2], ELMo[3], KoBERT[4] 모델을 구현하여 분류 성능을 비교하였다. 같은 단어를 동일하게 임베딩하는 모델에 비해 문맥에 따라 다르게 임베딩하는 전이학습 모델이 높은 정확도를 낸다는 것을 확인하였고, 17개 카테고리 별, 모델 성능 결과를 분석하여 실제 이커머스 산업에서 적용할 수 있는 감성 분석 모델 구성을 제안한다. 그리고 모델별 용량에 따른 추론 속도를 비교하여 실시간 서비스가 가능할 수 있는 모델 연구 방향을 제시한다.

Keywords

References

Hochreiter & Schmidhuber. (1997). LONG SHORT-TERM MEMORY. Neural Computation, DOI: 10.1162/neco.1997.9.8.1735
Lecun. (1998). Gradient-Based Learning Applied to Document Recognition. IEEE, 86(11), 2278-2324. DOI:10.1109/5.726791
M. Peters. (2018). ELMo-Deep contextualized word representations. NAACL 2018. https://arxiv.org/abs/1802.05365
SKTBrain, KoBERT. (2019). https://github.com/SKTBrain/KoBERT
J. Devlin, K. Lee & K. Toutanova. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. http://arxiv.org/abs/1810.04805
H. M. Kim & K. B. Park. (2019). Sentiment analysis of online food product review using ensemble technique. Journal of Digital Convergence, 17(4), 115-122. DOI: 10.14400/JDC.2019.17.4.11
H. Y. Park & K. J. Kim. (2019). Sentiment Analysis of Movie Review Using Integrated CNN-LSTM Model. Journal of Intelligence and Information Systems, 25(4), 141-154. DOI : 10.13088/jiis.2019.25.4.141
D. Bahdanau, K. H. Cho & Y. Bengio (2014) Neural Machine Translation by Jointly Learning to Align and Translate. ICLR 2015. https://arxiv.org/abs/1409.0473
Y. Kim. (2014) Convolutional Neural Networks for Sentence Classification. EMNLP 2014. https://arxiv.org/abs/1408.5882
Vaswani et al. (2017). Attention is all you need. https://arxiv.org/abs/1706.03762
T. Kudo & J. Richardson. (2018). SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing. EMNLP2018, 66-71 https://arxiv.org/abs/1808.06226
A. Joulin. (2016). FastText.zip: Compressing text classification models. ICLR 2017. https://arxiv.org/abs/1612.03651
A. F. Agarap. (2018). Deep Learning using Rectified Linear Units (ReLU), 1, 2-8. https://arxiv.org/abs/1803.08375
D. P. Kingma & J. Ba. (2014). Adam: A Method for Stochastic Optimization. 1-15. https://doi.org/http://doi.acm.org.ezproxy.lib.ucf.edu/10.1145/1830483.1830503
D. Hendrycks & K. Gimpel. (2016). Gaussian Error Linear Units (GELUs). https://arxiv.org/abs/1606.08415
M. A. Gordon - All The Ways to Compress http://mitchgordon.me/machine/learning/2019/11/18/all-the-ways-to-compress-BERT.html

Journal of the Korea Convergence Society (한국융합학회논문지)

E-commerce data based Sentiment Analysis Model Implementation using Natural Language Processing Model

자연어처리 모델을 이용한 이커머스 데이터 기반 감성 분석 모델 구축

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)