DOI QR코드

DOI QR Code

A Study on the Toxic Comments Classification Using CNN Modeling with Highway Network and OOV Process

하이웨이 네트워크 기반 CNN 모델링 및 사전 외 어휘 처리 기술을 활용한 악성 댓글 분류 연구

  • Received : 2020.08.31
  • Accepted : 2020.09.11
  • Published : 2020.09.30

Abstract

Purpose Recently, various issues related to toxic comments on web portal sites and SNS are becoming a major social problem. Toxic comments can threaten Internet users in the type of defamation, personal attacks, and invasion of privacy. Over past few years, academia and industry have been conducting research in various ways to solve this problem. The purpose of this study is to develop the deep learning modeling for toxic comments classification. Design/methodology/approach This study analyzed 7,878 internet news comments through CNN classification modeling based on Highway Network and OOV process. Findings The bias and hate expressions of toxic comments were classified into three classes, and achieved 67.49% of the weighted f1 score. In terms of weighted f1 score performance level, this was superior to approximate 50~60% of the previous studies.

Keywords

References

  1. 경찰청, "전체 사이버범죄 발생 및 검거 현황," 2020, https://www.police.go.kr/www/open/publice/publice0204.jsp
  2. 과학기술정보통신부, "2020 AI challenge," 2020, http://www.aichallenge.or.kr/main/main.do
  3. 과학기술정보통신부, "2019 인터넷이용실태조사 결과 발표," 2020.
  4. 김현귀, "인터넷 실명제의 도입과 헌법재판소 결정", 헌법판례연구, 제14권, 2013, pp: 157-192.
  5. 김현정, 윤영미, 이병문, "향상된 FFP(Feature Frequency Profile)를 활용한 악성 댓글의 판별시스템," 한국정보기술학회논문지, 제9권, 1호, 2011, pp: 207-216.
  6. 배민영, 은지현, 장두성, 차정원, "지지 벡터 기계와 토픽 시그너처를 이용한 댓글 분류 시스템: 언어에 독립적인 댓글 분류 시스템," 한국 HCI 학회 학술대회, 2009, pp: 263-266.
  7. 성지석, 임희석, "그래프 구조를 이용한 악성 댓글 분류 시스템 설계 및 구현," 한국융합학회논문지, 제11권, 6호, 2020, pp. 23-28. https://doi.org/10.15207/jkcs.2020.11.6.023
  8. 양낙영, 김성근, 강주영, "텍스트 마이닝 방법론과 메신저 UI를 활용한 융합연구 촉진을 위한 연구자 및 연구 분야 추천 시스템의 제안," 정보시스템연구, 제27권, 4호, 2018, pp. 71-96.
  9. 정건용, 윤승식, 강주영, "재정정보 활용을 위한 텍스트 마이닝 기반 회계용어 형태소 분석기 구축. 정보시스템연구," 제28권, 4호, 2019, pp. 155-174.
  10. 홍진주, 김세한, 박제원, 최재현, "감성분석과 SVM을 이용한 인터넷 악성 댓글 탐지 기법," 한국정보통신학회논문지, 제20권, 2호, 2016, pp: 260-267. https://doi.org/10.6109/jkiice.2016.20.2.260
  11. (주)한국리서치, "[기획] 악성 댓글, 이대로 괜찮은가", https://hrcopinion.co.kr/archives/14589, 2020.
  12. Carta, S., Corriga, A., Mulas, R., Recupero, D. R., and Saia, R.. "A Supervised Multi-class Multi-label Word Embeddings Approach for Toxic Comment Classification," Paper presented at the KDIR, 2019.
  13. Georgakopoulos, S. V., Sotiris K. T., Aristidis, G. V., and Vassilis, P. P., "Convolutional Neural Networks for Toxic Comment Classification," Paper presented at the Proceedings of the 10th Hellenic Conference on Artificial Intelligence, 2018.
  14. Kaggle Competition, Toxic Comment Classification Challenge, 2017, Available: https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/overview
  15. Kim, Y., "Convolutional Neural Networks for Sentence Classification," arXiv preprint arXiv:1408.5882, 2014.
  16. Park, K., and Ha, S., "Customer Service Evaluation based on Online Text Analytics: Sentiment Analysis and Structural Topic Modeling," Korea Association Information Systems(정보시스템연구), Vol. 26, No. 4, 2017, pp. 327-353.
  17. LeCun, Y., Bernhard, E. B., John, S. D., Donnie H., Richard, E. H., Wayne, E. H., and Lawrence, D. J., "Handwritten Digit Recognition with a Back-Propagation Network," Paper presented at the Advances in neural information processing systems, 1990.
  18. Li, S., "Application of Recurrent Neural Networks in Toxic Comment Classification," UCLA Master's Thesis, 2018.
  19. Mikolov, T., Kai, C., Greg, C., and Jeffrey D., "Efficient Estimation of Word Representations in Vector Space", arXiv preprint arXiv:1301.3781, 2013.
  20. Moon, J., Cho, I., and Lee, J., "Beep! Korean Corpus of Online News Comments for Toxic Speech Detection," arXiv preprint arXiv:2005.12503, 2020.
  21. Reuters Institute, "Digital News Report 2020," 2020.
  22. Srivastava, R. K., Klaus G., and Jurgen S., "Highway Networks", arXiv preprint arXiv:1505.00387, 2015.
  23. Srivastava, S., Prerna K., and Vartika T., "Identifying Aggression and Toxicity in Comments Using Capsule Network," Paper presented at the Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018), 2018.