- Volume 22 Issue 1
Owing to the extensive use of Web media and the development of the IT industry, a large amount of data has been generated, shared, and stored. Nowadays, various types of unstructured data such as image, sound, video, and text are distributed through Web media. Therefore, many attempts have been made in recent years to discover new value through an analysis of these unstructured data. Among these types of unstructured data, text is recognized as the most representative method for users to express and share their opinions on the Web. In this sense, demand for obtaining new insights through text analysis is steadily increasing. Accordingly, text mining is increasingly being used for different purposes in various fields. In particular, issue tracking is being widely studied not only in the academic world but also in industries because it can be used to extract various issues from text such as news, (SocialNetworkServices) to analyze the trends of these issues. Conventionally, issue tracking is used to identify major issues sustained over a long period of time through topic modeling and to analyze the detailed distribution of documents involved in each issue. However, because conventional issue tracking assumes that the content composing each issue does not change throughout the entire tracking period, it cannot represent the dynamic mutation process of detailed issues that can be created, merged, divided, and deleted between these periods. Moreover, because only keywords that appear consistently throughout the entire period can be derived as issue keywords, concrete issue keywords such as "nuclear test" and "separated families" may be concealed by more general issue keywords such as "North Korea" in an analysis over a long period of time. This implies that many meaningful but short-lived issues cannot be discovered by conventional issue tracking. Note that detailed keywords are preferable to general keywords because the former can be clues for providing actionable strategies. To overcome these limitations, we performed an independent analysis on the documents of each detailed period. We generated an issue flow diagram based on the similarity of each issue between two consecutive periods. The issue transition pattern among categories was analyzed by using the category information of each document. In this study, we then applied the proposed methodology to a real case of 53,739 news articles. We derived an issue flow diagram from the articles. We then proposed the following useful application scenarios for the issue flow diagram presented in the experiment section. First, we can identify an issue that actively appears during a certain period and promptly disappears in the next period. Second, the preceding and following issues of a particular issue can be easily discovered from the issue flow diagram. This implies that our methodology can be used to discover the association between inter-period issues. Finally, an interesting pattern of one-way and two-way transitions was discovered by analyzing the transition patterns of issues through category analysis. Thus, we discovered that a pair of mutually similar categories induces two-way transitions. In contrast, one-way transitions can be recognized as an indicator that issues in a certain category tend to be influenced by other issues in another category. For practical application of the proposed methodology, high-quality word and stop word dictionaries need to be constructed. In addition, not only the number of documents but also additional meta-information such as the read counts, written time, and comments of documents should be analyzed. A rigorous performance evaluation or validation of the proposed methodology should be performed in future works.
Big Data;Data Mining;Issue Tracking;Text Mining;Topic Modeling;Trend Analysis
- Aggarwal, A., G. Waghmare, and A. Sureka, "Mining Issue Tracking Systems Using Topic Models for Trend Analysis, Corpus Exploration and Understanding Evolution," Proceedings of the 3rd International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering, (2014), 52-58.
- Albright, R., Taming Text with the SVD, SAS Institute Inc, 2006.
- Alsumait, L., D. Barbara, and C. Domeniconi, "On-Line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking," Proceedings of the 8th IEEE International Conference on Data Mining in Data Mining, (2008), 3-12.
- Bae, J. H., N. G. Han, and M. Song, "Twitter Issue Tracking System by Topic Modeling Techniques," Journal of Intelligence and Information Systems, Vol.20, No.2(2014), 109-122.
- Han, J., M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, 3rd Edition, Morgan Kaufmann Publishers, 2011.
- Haribhakta, Y., A. Malgaonkar, and P. Kulkarni, "Unsupervised Topic Detection Model and Its Application in Text Categorization," Proceedings of the CUBE International Information Technology Conference, (2012), 314-319.
- Hearst, M. A., "Untangling Text Data Mining," Proceedings of the 37th ACL, (1999), 3-10.
- Jeong, D. H. and M. Song, "Time Gap Analysis by the Topic Model-Based Temporal Technique," Journal of Informetrics, Vol.8, No.3(2014), 776-790. https://doi.org/10.1016/j.joi.2014.07.005
- Kim, J., N. Kim, and Y. Cho, "User-Perspective Issue Clustering Using Multi-Layered Two-Mode Network Analysis," Journal of Intelligence and Information Systems, Vol.20, No.2(2014), 93-107.
- Kim, J. C., J. H. Lee, G. J. Kim, S. S. Park, and D. S. Jang, "Data Engineering : Time Series Analysis of Patent Keywords for Forecasting Emerging Technology," KIPS Transactions on Software and Data Engineering, Vol.3, No.9(2014), 355-360. https://doi.org/10.3745/KTSDE.2014.3.9.355
- Lim, M., and N. Kim "Analyzing the Issue Life Cycle by Mapping Inter-Period Issues," Journal of Intelligence and Information Systems. Vol.20, No.4(2014), 25-41. https://doi.org/10.13088/jiis.2014.20.4.25
- Liu, C., and N. Kim, "Individual Interests Tracking : Beyond Macro-level Issue Tracking," Journal of The Korea Society of IT Services, Vol.13, No.4(2014), 275-287.
- Ma, J., Y. Wang, H. Zhu, and Y. Shen, "Research on Method of Adaptive Topic Tracking Based on Evolution of Public Opinion Ontology," ACEEE International Journal on Information Technology, Vol.4, No.1(2014), 1-10.
- Mooney, R. J. and R. Bunescu, "Mining Knowledge from Text using Information Extraction," ACM SIGKDD Explorations, Vol.7, No.1(2006), 3-10.
- Morinaga, S. and K. Yamanishi, "Tracking Dynamics of Topic Trends Using a Finite Mixture Model," Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (2004), 811-816.
- Park, J. H. and M. Song, "A Study on the Research Trends in Library & Information Science in Korea using Topic Modeling," Journal of the Korean Society for Information Management, Vol.30, No.1(2013), 7-32.
- Provost, F. and T. Fawcett, Data Science for Business, O'Reilly, 2013.
- Rajaraman, K. and A. H. Tan, "Topic Detection, Tracking, and Trend Analysis Using Self-Organizing Neural Networks," Proceedings of Advances in Knowledge Discovery and Data Mining, (2001), 102-107.
- Salton, G., A. Wong, and C. S. Yang, "A Vector Space Model for Automatic Indexing," Communications of the ACM, Vol.18, No.11(1975), 613-620. https://doi.org/10.1145/361219.361220
- Sebastiani, F., "Machine Learning in Automated Text Categorization," ACM Computing Surveys, Vol.34, No.1(2002), 1-47. https://doi.org/10.1145/505282.505283
- Sebastiani, F., "Classification of Text, Automatic," The Encyclopedia of Language and Linguistics, Vol.14, 2nd Edition, Elsevier Science Pub, 2006.
- Stanvrianou, A., P. Andritsos, and N. Nicoloyannis, "Overview and Semantic Issues of Text Mining," ACM SIGMOD Record, Vol.36, No.3(2007), 23-34. https://doi.org/10.1145/1324185.1324190
- Wang, X. and A. McCallum, "Topics Over Time: a Non-Markov Continuous-Time Model of Topical Trends," Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (2006), 424-433.
- Witten, I. H., Text Mining, Practical Handbook of Internet Computing, CRC Press, 2004.
- Yu, E., Y. Kim, N. Kim, and S. R. Jeong, "Predicting the Direction of the Stock Index by Using a Domain-Specific Sentiment Dictionary," Journal of Intelligence and Information Systems, Vol.19, No.1(2013), 95-110. https://doi.org/10.13088/jiis.2013.19.1.095
Supported by : 한국연구재단