Analysis of English abstracts in Journal of the Korean Data & Information Science Society using topic models and social network analysis

토픽 모형 및 사회연결망 분석을 이용한 한국데이터정보과학회지 영문초록 분석

  • Received : 2014.12.16
  • Accepted : 2015.01.10
  • Published : 2015.01.31


This article analyzes English abstracts of the articles published in Journal of the Korean Data & Information Science Society using text mining techniques. At first, term-document matrices are formed by various methods and then visualized by social network analysis. LDA (latent Dirichlet allocation) and CTM (correlated topic model) are also employed in order to extract topics from the abstracts. Performances of the topic models are compared via entropy for several numbers of topics and weighting methods to form term-document matrices.

이 논문에서는 텍스트마이닝 (text mining) 기법을 이용하여 한국데이터정보과학회지에 게재된 논문의 영어초록을 분석하였다. 먼저 다양한 방법을 통해 단어-문서 행렬 (term-document matrix)을 생성하고 이를 사회연결망 분석 (social network analysis)을 통해 시각화하였다. 또한 토픽을 추출하기 위한 방법으로 LDA (latent Dirichlet allocation)와 CTM (correlated topic model)을 사용하였다. 토픽의 수, 단어-문서 행렬의 생성방법에 따라 엔트로피 (entropy)를 통해 토픽 추출 모형들의 성능을 비교하였다.



  1. Blei, D. M. and Lafferty, J. D. (2006). Dynamic topic models. Proceedings of the 23rd International Conference on Machine Learning, 113-120.
  2. Blei, D. M. and Lafferty, J. D. (2007). A correlated topic model of science. The Annals of Applied Statistics, 1, 17-35.
  3. Blei, D. M. and Lafferty, J. D. (2009). Topic models. In Text Mining: Classification, Clustering, and Applications, edited by A. N. Srivastava and M. Sahami, Champman and Hall/CRC, Boca Raton, 71-94.
  4. Blei, D. M., Ng, A. Y. and Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993-1022.
  5. Chung, H. and Han, C. (2013). Conditional bootstrap confidence intervals for classification error rate when a block of observations is missing. Journal of the Korean Data & Information Science Society, 24, 189-200.
  6. Hornik, K. and Grun, B. (2011). topicmodels: An R package for fitting topic models. Journal of Statistical Software, 40, 1-30.
  7. Huang, J. and Malisiewicz, T. (2006). Correlated topic model details, Technical Report, Carnegie Mellon University, Pittsburgh, PA.
  8. Shim, J., Kim, Y. and Hwang, C. (2013). Generalized kernel estimating equation for panel estimation of small area unemployment rates. Journal of the Korean Data & Information Science Society, 24, 1199-1210.

Cited by

  1. Performance analysis of volleyball games using the social network and text mining techniques vol.26, pp.3, 2015,
  2. Research of Topic Analysis for Extracting the Relationship between Science Data vol.21, pp.1, 2016,
  3. A study on fractal dimensions of art works vol.27, pp.2, 2016,
  4. Research Topics in Industrial Engineering 2001~2015 vol.42, pp.6, 2016,
  5. Robust inference with order constraint in microarray study vol.25, pp.5, 2015,