DOI QR코드

DOI QR Code

Study on Effective Extraction of New Coined Vocabulary from Political Domain Article and News Comment

정치 도메인에서 신조어휘의 효과적인 추출 및 의미 분석에 대한 연구

  • 이지현 (한동대학교 전산전자공학부) ;
  • 김재홍 (한동대학교 커뮤니케이션학부) ;
  • 조예성 (한동대학교 ICT 창업학부) ;
  • 이민구 (한동대학교 커뮤니케이션학부) ;
  • 최혜봉 (한동대학교 ICT 창업학부)
  • Received : 2021.03.02
  • Accepted : 2021.04.18
  • Published : 2021.05.31

Abstract

Text mining is one of the useful tools to discover public opinion and perception regarding political issues from big data. It is very common that users of social media express their opinion with newly-coined words such as slang and emoji. However, those new words are not effectively captured by traditional text mining methods that process text data using a language dictionary. In this study, we propose effective methods to extract newly-coined words that connote the political stance and opinion of users. With various text mining techniques, I attempt to discover the context and the political meaning of the new words.

정치적 사안에 대한 대중의 의견과 인식을 객관적으로 이해하기 위한 방법으로 텍스트 마이닝을 통한 빅데이터 분석을 수행할 수 있다. 기존 어휘 사전에 기반한 텍스트 마이닝 알고리즘은 신조어와 같이 사전에 수록되지 않은 어휘를 분석하는데 한계가 나타난다. SNS를 통해 나타나는 사용자들의 의견은 많은 경우 신조어와 비속어를 포함하는데, 이러한 어휘들을 효과적으로 분석하지 못한다면 정확한 대중의 인식과 의견을 파악하기 어렵게 된다. 본 논문은 정치 섹션의 뉴스 댓글로부터 정치적 의미성을 지니는 신조어와 비속어를 효과적으로 추출하는 방법을 제안하고, 추출한 신조어휘들의 의미와 맥락을 이해하기 위한 다양한 방법을 제시하였음.

Keywords

References

  1. H.J. Jung, J.H. Bae, S.L. Hong, C.U. Park, M. Song, "Analysis of Twitter Public Opinion in Different Political Views : A Case Study of Sewol Ferry Accident", Korean Society For Journalism And Communication Studies, Vol. 60, No. 2, pp. 269-302, 2016. https://doi.org/10.20879/kjjcs.2016.60.2.010
  2. E.H. An, J.K. An. "An Analysis of the 2017 Korean Presidential Election Using Text Mining", Vol. 11. No. 5, pp. 199-207. 2020. DOI: https://doi.org/10.15207/JKCS.2020.11.5.199
  3. J.Y. Han, Y.I. Lee., J.B. Lee, M..Y. Cha. "The fallacy of echo chambers: Analyzing the political slants of user-generated news comments in Korean media", In Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019),pp. 370-374. 2019. DOI: https://doi.org/10.18653/v1/D19-5548
  4. H. Kang, D.K. Kang, "Long Short Term Memory based Political Polarity Analysis in Cyber Public Sphere", International Journal of Advanced Culture Technology, Vol. 5, No.4, 2017. DOI: https://doi.org/10.17703/IJACT.2017.5.4.57 DOI:https://doi.org/10.17703/IJACT.2017.5.4.57
  5. Y.I. Lee, J.Y. Han, M.Y. Cha, "Building a Political Bias Classifier for News Comments using User Labeling", The Korean Institute of Information Scientists and Engineers, pp. 1643-1645, 2020.
  6. H.B. Choi, J.H. Kim, J.H. Lee, M.G. Lee. "Political Information Filtering on Online News Comment", The Journal of the Convergence on Culture Technology, Vol. 6, No. 4, pp. 575-582, 2020. DOI: https://doi.org/10.17703/JCCT.2020.6.4.575
  7. J.W. Kim, J.W Jeong, M.Y Cha, Automatic New Korean Words Extraction Using Portal News Headlines, The HCI Society of Korea, ,pp. 163-16 , 2020
  8. K.D. Hyun, N.W. Jung, M.H. Seo, "Examining the Effects of Perceived Partisan Slants of News and User Comments from Portal News Sites on Portal News Trust, Third Person Perception and Selective Exposure : Comparisons of Conservative and Progressive Users", Korean Society For Journalism And Communication Studies, Vol. 64, No. 4, pp. 247-288, 2020. DOI: https://doi.org/10.20879/kjjcs.2020.64.4.007
  9. https://pypi.org/project/beautifulsoup4/
  10. https://github.com/shineware/KOMORAN
  11. J.P. Hong, J.W. Cha, "A New Korean Morphological Analyzer using Eojeol Pattern Dictionary", The Korean Institute of Information Scientists and Engineers, Vol. 35, pp. 279-284, 2008.
  12. "Korean Lemmatizer", https://github.com/lovit/korean_lemmatizer
  13. National Istitute of Korean Language, "전자사전 전체파일", https://ithub.korean.go.kr/user/total/database/electronicDicManager.do, 2017.
  14. Q. HE, "Knowledge Discovery Through Co-Word Analysis", LIBRARY TRENDS, 1999.
  15. H.J. Kim, M. Song, "A Study on the Research Trends in Domestic/International Information Science Articles by Co-word Analysis", Journal of the Korean society for information management, Vol. .31, No. 91, pp. 99 - 118, 2014. DOI: http://dx.doi.org/10.3743/KOSIM.2014.31.1.099
  16. S. Lai, L. Kang, L. Xu, J. Zhao, "How to generate a good word embedding", IEEE Intelligent Systems, Vol. 31, No. 6, pp. 5-14. 2016. https://doi.org/10.1109/MIS.2016.45
  17. T. Mikolov, W.T. Yih, G. Zweig. "Linguistic regularities in continuous space word representations", In Proceedings of the 2013 conference of the north american chapter of the association for computational linguistics: Human language technologies, pp. 746-751, 2013.
  18. P. Bojanowski, E. Grave, A. Joulin, T. Mikolov, "Enriching word vectors with subword information." Transactions of the Association for Computational Linguistics", Vol.5, pp. 135-146. 2017. DOI: https://doi.org/10.1162/tacl_a_00051
  19. L. Van der Maaten, G. Hinton. "Visualizing data using t-SNE. Journal of machine learning research", Journal of Machine Learning Research, Vol. 9, No. 11. 2008.