DOI QR코드

DOI QR Code

Design and implementation of malicious comment classification system using graph structure

그래프 구조를 이용한 악성 댓글 분류 시스템 설계 및 구현

  • Sung, Ji-Suk (Graduates School of Computer & Information Technology, Korea University) ;
  • Lim, Heui-Seok (Department of Computer Science and Engineering, Korea University)
  • 성지석 (고려대학교 컴퓨터정보통신대학원) ;
  • 임희석 (고려대학교 컴퓨터학과)
  • Received : 2020.03.31
  • Accepted : 2020.06.20
  • Published : 2020.06.28

Abstract

A comment system is essential for communication on the Internet. However, there are also malicious comments such as inappropriate expression of others by exploiting anonymity online. In order to protect users from malicious comments, classification of malicious / normal comments is necessary, and this can be implemented as text classification. Text classification is one of the important topics in natural language processing, and studies using pre-trained models such as BERT and graph structures such as GCN and GAT have been actively conducted. In this study, we implemented a comment classification system using BERT, GCN, and GAT for actual published comments and compared the performance. In this study, the system using the graph-based model showed higher performance than the BERT.

인터넷상의 소통을 위해 댓글 시스템은 필수적이다. 하지만 온라인상의 익명성을 악용하여 타인에 대한 부적절한 표현 등의 악성 댓글 또한 존재한다. 악성 댓글로부터 사용자를 보호하기 위해 악성/정상 댓글의 분류가 필요하고 이는 텍스트 분류로 구현할 수 있다. 자연어 처리에서 텍스트 분류는 중요한 주제 중 하나이고 최근 BERT 등 pretrained model을 활용한 연구와 GCN, GAT 등의 그래프 구조를 활용한 연구가 활발히 진행되고 있다. 본 연구에서는 실제 공개된 댓글에 대해 BERT, GCN, GAT 을 활용하여 댓글 분류 시스템을 구현하고 성능을 비교하였다. 본 연구에서는 그래프 기반 모델을 사용한 시스템이 BERT 대비 높은 성능을 보여주었다.

Keywords

References

  1. J. W. Kim & J. C. Kim. (2019). A Study on Factors Affecting Intention to Write Malicious Comments of Entertainers-related Post. Korean Association of Addiction Crime Review, 9(3), 1-20.
  2. I. K. Jeong & Y. S. Kim. (2006). Impact of "Datgeul" of Online Media on Public Opinion : An Examination of Perception of Public Opinion and Third Person Effect. Korean Journal of Journalism & Communication Studies, 50, 302-327.
  3. J. R Kim. (2018). A Study on Corporate Reputation and Profitability - Focus on Online News and Comments -. Sungkyunkwan University.
  4. Korea Internet & Security Agency. (2012). Internet Ethics Culture Survey Summary Report. Seoul. Korea Internet & Security Agency.
  5. Y. Kim. (2014). Convolutional Neural Networks for Sentence Classification. https://doi.org/10.3115/v1/D14-1181
  6. Zhou, C., Sun, C., Liu, Z. & Lau, F. C. M. (2015). A C-LSTM Neural Network for Text Classification. http://arxiv.org/abs/1511.08630
  7. Devlin, J., M. W. Chang, K. Lee & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Mlm. http://arxiv.org/abs/1810.04805
  8. Liu, X., You, X., Zhang, X., Wu, J. & Lv, P. (2020). Tensor Graph Convolutional Networks for Text Classification. http://arxiv.org/abs/2001.05313
  9. J. J. Hong. (2015). A Malicious Comments Detection Technique on the Internet. Soongsil University.
  10. S. H. Kim. (2016). A Malicious Comments Detection Technique on the Internet using Artificial Neural Network. Soongsil University.
  11. J. W. Kim, H. I. Jo & B. G. Lee. (2019). A Comparison Study on Performance of Malicious Comment Classification Models Applied with Artificial Neural Network. Journal of Digital Contents Society, 20, 1429-1437. https://doi.org/10.9728/dcs.2019.20.7.1429
  12. Kaggle. (2019). jigsaw-unintended-bias-in-toxicity-classification https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification
  13. Kipf, T. N. & Welling, M. (2017). Semi-supervised classification with graph convolutional networks. 5th International Conference on Learning Representations, ICLR 2017 - Conference Track Proceedings, 1-14.
  14. Kipf, T. N. (2016) Multi-layer Graph Convolutional Network (GCN) with first-order filters. https://tkipf.github.io/graph-convolutional-networks/images/gcn_web.png
  15. Agarap, A. F. (2018). Deep Learning using Rectified Linear Units (ReLU). 1, 2-8. http://arxiv.org/abs/1803.08375
  16. Mikolov, T., Chen, K., Corrado, G. & Dean, J. (n.d.). 5021-Distributed-Representations-of-Words-and-Phrases-and-Their-Compositionality. 1-9. https://doi.org/10.1162/jmlr.2003.3.4-5.951
  17. Kingma, D. P. & Ba, J. (2014). Adam: A Method for Stochastic Optimization. 1-15. https://doi.org/http://doi.acm.org.ezproxy.lib.ucf.edu/10.1145/1830483.1830503