DOI QR코드

DOI QR Code

그래프 임베딩을 활용한 코로나19 가짜뉴스 탐지 연구 - 사회적 참여 네트워크의 이용 여부에 따른 탐지 성능 비교

A study on the detection of fake news - The Comparison of detection performance according to the use of social engagement networks

  • 정이태 (국민대학교 비즈니스IT전문대학원) ;
  • 안현철 (국민대학교 비즈니스IT전문대학원)
  • Jeong, Iitae (Graduate School of Business IT, Kookmin University) ;
  • Ahn, Hyunchul (Graduate School of Business IT, Kookmin University)
  • 투고 : 2021.11.26
  • 심사 : 2022.03.08
  • 발행 : 2022.03.31

초록

인터넷 및 모바일 기술의 발달과 소셜미디어의 확산으로 인해 다량의 정보들이 온라인 상에서 생성, 유통되고 있다. 이중에는 대중에게 도움이 되는 유익한 정보들도 있지만, 역기능을 하는 이른바 가짜뉴스들도 함께 유통되고 있다. 지난 2020년 코로나19의 전세계적인 확산 이후, 온라인 상에는 이와 관련한 수많은 가짜뉴스들이 유통되었다. 다른 가짜뉴스들과 달리 코로나19와 관련된 가짜뉴스는 사람들의 건강, 나아가 생명까지 위협할 수 있다는 점에서 그 심각성이 매우 크다고 할 수 있다. 때문에 코로나19와 관련한 가짜뉴스를 자동으로 탐지하고, 이를 예방하는 지능형 기술은 사회적 건강도를 제고하는데 매우 의미 있는 연구주제라 할 수 있다. 이러한 배경에서 본 연구에서는 코로나19 관련 가짜뉴스 탐지를 효과적으로 수행하기 위해 그래프 임베딩 방법 중 하나인 Graph2vec을 활용한 방법을 제안한다. 가짜뉴스 탐지에 대한 주류 방법은 뉴스 콘텐츠 기반 즉, 텍스트에 대한 특징 분석으로 진행되었으나 본 연구에서는 사회적 참여 네트워크 내에서의 정보 전달 관계를 추가로 활용함으로써 보다 효과적으로 코로나19와 관련된 가짜뉴스를 탐지할 수 있었으며 성능 측면에서 정확도 향상을 확인할 수 있었다.

With the development of Internet and mobile technology and the spread of social media, a large amount of information is being generated and distributed online. Some of them are useful information for the public, but others are misleading information. The misleading information, so-called 'fake news', has been causing great harm to our society in recent years. Since the global spread of COVID-19 in 2020, much of fake news has been distributed online. Unlike other fake news, fake news related to COVID-19 can threaten people's health and even their lives. Therefore, intelligent technology that automatically detects and prevents fake news related to COVID-19 is a meaningful research topic to improve social health. Fake news related to COVID-19 has spread rapidly through social media, however, there have been few studies in Korea that proposed intelligent fake news detection using the information about how the fake news spreads through social media. Under this background, we propose a novel model that uses Graph2vec, one of the graph embedding methods, to effectively detect fake news related to COVID-19. The mainstream approaches of fake news detection have focused on news content, i.e., characteristics of the text, but the proposed model in this study can exploit information transmission relationships in social engagement networks when detecting fake news related to COVID-19. Experiments using a real-world data set have shown that our proposed model outperforms traditional models from the perspectives of prediction accuracy.

키워드

과제정보

본 연구는 정보통신산업진흥원의 2021인공지능 고성능 컴퓨팅 자원사업의 연구결과로 수행되었음. 본 논문은 교육부 및 한국연구재단의 4단계 두뇌한국21 사업(4단계 BK21 사업)으로 지원된 연구임.

참고문헌

  1. 고정민. (2021). "코로나보다 백신이 더 위험하다?"...끝나지 않는 '가짜뉴스'전쟁, 청년의사. Retrieved February 21, 2022, from https://www.docdocdoc.co.kr/news/articleView.html?idxno=2013957
  2. 권오성. (2021). 좋아요의 함정...가짜뉴스 권하는 SNS, 한겨레. Retrieved February 21, 2022, from https://www.hani.co.kr/arti/science/science_general/785227.html
  3. 김선호, 김위근. (2019). 유튜브의 대약진 , 한국언론진흥재단. Retrieved February 21, 2022, from https://www.kpf.or.kr/front/board/boardContentsView.do?board_id=246&contents_id=000344023CF07421F30ED65B47114EE4
  4. 김종광, (2020). 경찰, 신종 코로나 바이러스 가짜 뉴스 엄정 대응, 제주일보. Retrieved February 21, 2022, from https://www.jejunews.com/news/articleView.html?idxno=2156143
  5. 박성수, 이건창. (2019). 효과적인 가짜 뉴스 탐지를 위한 텍스트 분석과 네트워크 임베딩 방법의 비교 연구. 디지털융복합연구, 17(5), 137-143. https://doi.org/10.14400/JDC.2019.17.5.137
  6. 심재승, 원하람, 안현철. (2019). A Study on the Effect of the Document Summarization Technique on the Fake News Detection Model, 지능정보연구, 25(3), 201-220. https://doi.org/10.13088/JIIS.2019.25.3.201
  7. 염정윤, 정세훈. (2019). 가짜뉴스 노출과 전파에 영향을 미치는 요인. 한국언론학보, 63(1), 7-45.
  8. 윤영석, 엄태원, 안재영, 이현우, 허재두. (2017). 페이크 뉴스 탐지 기술 동향과 시사점. ICT 신기술 주간기술동향, 정보통신기술진흥센터, 13.
  9. 윤태욱, 안현철. (2018). 텍스트 마이닝과 기계 학습을 이용한 국내 가짜뉴스 예측. Journal of Information Technology Applications & Management, 25(1), 19-32.
  10. 이도경, 김민태, 김우주. (2019). 의존 구문 분석을 이용한 질의 기반 정답 추출. 지능정보연구, 25(3), 161-177. https://doi.org/10.13088/JIIS.2019.25.3.161
  11. 이영호. (2021). 한국 SNS 이용률 세계 2위...10~30대 인스타그램, 40~50대 밴드, 한국경제TV. Retrieved February 21, 2022, from https://www.wowtv.co.kr/NewsCenter/News/Read?articleId=A202106160022
  12. 이완수. (2018). 가짜뉴스 (fake news) 란 무엇인가? 미디어와 인격권, 4(2), 173-214. https://doi.org/10.22837/PAC.2018.4.2.173
  13. 이윤주, 원하람, 심재승, 안현철. (2020) A Hybrid Collaborative Filtering-based Product Recommender System using Search Keywords, 지능정보연구, 26(1), 151-166. https://doi.org/10.13088/JIIS.2020.26.1.151
  14. 조승한. (2020). 코로나19 가짜뉴스로 1~3월 사이 800명 숨졌다, 동아사이언스. Retrieved February 21, 2022, from http://dongascience.donga.com/news.php?idx=39006
  15. 차미영. (2020a). 코로나바이러스와 인포데믹, 기초과학연구원(IBS). Retrieved February 21, 2022, from https://www.ibs.re.kr/cop/bbs/BBSMSTR_000000000971/selectBoardArticle.do?nttId=18234&pageIndex=2
  16. 차미영. (2020b). 코로나19 가짜뉴스에 맞선 데이터 과학, 기초과학연구원(IBS). Retrieved February 21, 2022, from https://www.ibs.re.kr/cop/bbs/BBSMSTR_000000000971/selectBoardArticle.do?nttId=18985&pageIndex=1&searchCnd=&searchWrd=
  17. 현윤진, 김남규. (2018). 뉴스와 소셜 데이터를 활용한 텍스트 기반 가짜 뉴스 탐지 방법론. 한국전자거래학회지, 23(4), 19-39. https://doi.org/10.7838/JSEBS.2018.23.4.019
  18. 황송민. (2020). '신종 코로나' 가짜뉴스 확산..."믿지 마세요", 농민신문. Retrieved February 21, 2022, from https://www.nongmin.com/news/NEWS/FLD/CNT/319737/view?site_preference=normal
  19. 황용석, 권오성. (2017). 가짜뉴스의 개념화와 규제수단에 관한 연구: 인터넷서비스사업자의 자율규제를 중심으로. 언론과법, 16(1), 53-101.
  20. Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
  21. Bondielli, A., & Marcelloni, F. (2019). A survey on fake news and rumour detection techniques. Information Sciences, 497, 38-55. https://doi.org/10.1016/j.ins.2019.05.035
  22. Bottou, L. (2012). Stochastic gradient descent tricks. In Neural networks: Tricks of the trade (pp. 421-436). Springer, Berlin, Heidelberg.
  23. Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785-794).
  24. Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., & Singer, Y. (2006). Online passive aggressive algorithms. Journal of Machine Learning Research, 7, 551-585.
  25. Cui, L., & Lee, D. (2020). Coaid: Covid-19 healthcare misinformation dataset. arXiv preprint arXiv:2006.00885.
  26. Dai, E., Sun, Y., & Wang, S. (2020). Ginger cannot cure cancer: Battling fake health news with a comprehensive data repository. In Proceedings of the International AAAI Conference on Web and Social Media (Vol. 14, pp. 853-862).
  27. Goldberger, J., Hinton, G. E., Roweis, S., & Salakhutdinov, R. R. (2004). Neighbourhood components analysis. Advances in Neural Information Processing Systems, 17.
  28. Grover, A., & Leskovec, J. (2016). Node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 855-864).
  29. Hinton, G. E. (1990). Connectionist learning procedures. In Machine learning (pp. 555-610). Morgan Kaufmann. https://doi.org/10.1016/B978-0-08-051055-2.50029-8
  30. Hamilton, W., Ying, Z., & Leskovec, J. (2017). Inductive representation learning on large graphs. Advances in Neural Information Processing Systems, 30.
  31. Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2(7).
  32. Han, Y., Karunasekera, S., & Leckie, C. (2020). Graph neural networks with continual learning for fake news detection from social media. arXiv preprint arXiv:2007.03316.
  33. Joachims, T. (1998). Making large-scale SVM learning practical (No. 1998, 28). Technical report.
  34. Kipf, T. N., & Welling, M. (2016). Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907.
  35. Le, Q., & Mikolov, T. (2014). 6Distributed representations of sentences and documents. In International conference on machine learning (pp. 1188-1196). PMLR.
  36. Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ...& Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
  37. Mahid, Z. I., Manickam, S., & Karuppayah, S. (2018, October). Fake news on social media: brief review on detection techniques. In 2018 Fourth International Conference on Advances in Computing, Communication & Automation (ICACCA) (pp. 1-5). IEEE.
  38. Narayanan, A., Chandramohan, M., Venkatesan, R., Chen, L., Liu, Y., & Jaiswal, S. (2017). Graph2vec: Learning distributed representations of graphs. arXiv preprint arXiv:1707.05005.
  39. Pal, M. (2005). Random forest classifier for remote sensing classification. International Journal of Remote Sensing, 26(1), 217-222. https://doi.org/10.1080/01431160412331269698
  40. Pennycook, G., Epstein, Z., Mosleh, M., Arechar, A. A., Eckles, D., & Rand, D. G. (2021). Shifting attention to accuracy can reduce misinformation online. Nature, 592(7855), 590-595. https://doi.org/10.1038/s41586-021-03344-2
  41. Perozzi, B., Al-Rfou, R., & Skiena, S. (2014, August). Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 701-710).
  42. Pierri, F., & Ceri, S. (2019). False news on social media: a data-driven survey. ACM Sigmod Record, 48(2), 18-27. https://doi.org/10.1145/3377330.3377334
  43. Quinlan, J. R. (1986). Induction of decision trees. Machine learning, 1(1), 81-106. https://doi.org/10.1007/BF00116251
  44. Ratsch, G., Onoda, T., & Muller, K. R. (2001). Soft margins for AdaBoost. Machine learning, 42(3), 287-320. https://doi.org/10.1023/a:1007618119488
  45. Ren, Y., & Zhang, J. (2020). HGAT: hierarchical graph attention network for fake news detection. arXiv, preprint arXiv:2002.04397.
  46. Ren, Y., Wang, B., Zhang, J., & Chang, Y. (2020). Adversarial active learning based heterogeneous graph neural network for fake news detection. In 2020 IEEE International Conference on Data Mining (ICDM) (pp. 452-461). IEEE.
  47. Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386. https://doi.org/10.1037/h0042519
  48. Shejkh, K. (2021). How Bad will the Coronavirus Outbreak Get? Here Are 6 Key Factors. Retrieved February 21, 2022, from https://www.nytimes.com/interactive/2020/world/asia/china-coronavirus-contain.html?action=click&module=RelatedLinks&pgtype=Article (Accessed 2021.10.13)
  49. Shim, J. S., Lee, Y., & Ahn, H. (2021). A link2vec-based fake news detection model using web search results. Expert Systems with Applications, 184, 115491. https://doi.org/10.1016/j.eswa.2021.115491
  50. Shu, K., Mahudeswaran, D., Wang, S., Lee, D., & Liu, H. (2018). Fakenewsnet: A data repository with news content, social context and spatialtemporal information for studying fake news on social media. arXiv preprint arXiv:1809.01286.
  51. Shu, K., Sliva, A., Wang, S., Tang, J., & Liu, H. (2017). Fake news detection on social media: A data mining perspective. ACM SIGKDD Explorations Newsletter, 19(1), 22-36. https://doi.org/10.1145/3137597.3137600
  52. Shu, K., Wang, S., & Liu, H. (2019). Beyond news contents: The role of social context for fake news detection. In Proceedings of the twelfth ACM international conference on web search and data mining (pp. 312-320).
  53. Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., & Mei, Q. (2015). Line: Large-scale information network embedding. In Proceedings of the 24th International Conference on World Wide Web (pp. 1067-1077).
  54. Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., & Bengio, Y. (2017). Graph attention networks. arXiv preprint arXiv: 1710.10903.
  55. Vosoughi, S., Roy, D., & Aral, S. (2018). The spread of true and false news online. Science, 359(6380), 1146-1151. https://doi.org/10.1126/science.aap9559
  56. Wang, W. Y. (2017). " liar, liar pants on fire": A new benchmark dataset for fake news detection. arXiv preprint arXiv:1705.00648.
  57. Wang, Y., Ma, F., Jin, Z., Yuan, Y., Xun, G., Jha, K., ... & Gao, J. (2018). EANN: Event adversarial neural networks for multi-modal fake news detection. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 849-857).