DOI QR코드

DOI QR Code

User-Perspective Issue Clustering Using Multi-Layered Two-Mode Network Analysis

다계층 이원 네트워크를 활용한 사용자 관점의 이슈 클러스터링

  • Kim, Jieun (Graduate School of Business IT, Kookmin University) ;
  • Kim, Namgyu (Graduate School of Business IT, Kookmin University) ;
  • Cho, Yoonho (College of Business Administration, Kookmin University)
  • 김지은 (국민대학교 Business IT 전문대학원) ;
  • 김남규 (국민대학교 Business IT 전문대학원) ;
  • 조윤호 (국민대학교 경영학부)
  • Received : 2014.06.15
  • Accepted : 2014.06.23
  • Published : 2014.06.30

Abstract

In this paper, we report what we have observed with regard to user-perspective issue clustering based on multi-layered two-mode network analysis. This work is significant in the context of data collection by companies about customer needs. Most companies have failed to uncover such needs for products or services properly in terms of demographic data such as age, income levels, and purchase history. Because of excessive reliance on limited internal data, most recommendation systems do not provide decision makers with appropriate business information for current business circumstances. However, part of the problem is the increasing regulation of personal data gathering and privacy. This makes demographic or transaction data collection more difficult, and is a significant hurdle for traditional recommendation approaches because these systems demand a great deal of personal data or transaction logs. Our motivation for presenting this paper to academia is our strong belief, and evidence, that most customers' requirements for products can be effectively and efficiently analyzed from unstructured textual data such as Internet news text. In order to derive users' requirements from textual data obtained online, the proposed approach in this paper attempts to construct double two-mode networks, such as a user-news network and news-issue network, and to integrate these into one quasi-network as the input for issue clustering. One of the contributions of this research is the development of a methodology utilizing enormous amounts of unstructured textual data for user-oriented issue clustering by leveraging existing text mining and social network analysis. In order to build multi-layered two-mode networks of news logs, we need some tools such as text mining and topic analysis. We used not only SAS Enterprise Miner 12.1, which provides a text miner module and cluster module for textual data analysis, but also NetMiner 4 for network visualization and analysis. Our approach for user-perspective issue clustering is composed of six main phases: crawling, topic analysis, access pattern analysis, network merging, network conversion, and clustering. In the first phase, we collect visit logs for news sites by crawler. After gathering unstructured news article data, the topic analysis phase extracts issues from each news article in order to build an article-news network. For simplicity, 100 topics are extracted from 13,652 articles. In the third phase, a user-article network is constructed with access patterns derived from web transaction logs. The double two-mode networks are then merged into a quasi-network of user-issue. Finally, in the user-oriented issue-clustering phase, we classify issues through structural equivalence, and compare these with the clustering results from statistical tools and network analysis. An experiment with a large dataset was performed to build a multi-layer two-mode network. After that, we compared the results of issue clustering from SAS with that of network analysis. The experimental dataset was from a web site ranking site, and the biggest portal site in Korea. The sample dataset contains 150 million transaction logs and 13,652 news articles of 5,000 panels over one year. User-article and article-issue networks are constructed and merged into a user-issue quasi-network using Netminer. Our issue-clustering results applied the Partitioning Around Medoids (PAM) algorithm and Multidimensional Scaling (MDS), and are consistent with the results from SAS clustering. In spite of extensive efforts to provide user information with recommendation systems, most projects are successful only when companies have sufficient data about users and transactions. Our proposed methodology, user-perspective issue clustering, can provide practical support to decision-making in companies because it enhances user-related data from unstructured textual data. To overcome the problem of insufficient data from traditional approaches, our methodology infers customers' real interests by utilizing web transaction logs. In addition, we suggest topic analysis and issue clustering as a practical means of issue identification.

대부분의 인터넷 쇼핑몰은 자사 고객의 관심 분야를 파악하고 이를 상품 추천에 효과적으로 활용하기 위해 많은 노력을 기울이고 있다. 하지만 고객이 회원 가입 시 직접 입력한 개인 정보는 신뢰하기가 어렵고, 고객의 구매 패턴을 통해 파악한 관심 분야 정보는 자사 사이트 내에 진입한 이후에만 보인 한정된 패턴이라는 측면에서 해당 고객의 다양한 관심분야를 제대로 나타낸다고 보기 어렵다. 이러한 한계를 극복하기 위해 본 연구에서는 고객의 평소 인터넷 사용 기록을 통해 최근 방문 사이트들의 주제를 분석함으로써, 고객의 실제 관심 분야를 파악할 수 있는 방안을 제시하였다. 또한 토픽 분석을 통해 각 사이트의 주제를 도출하고 도출된 주제를 다시 동시 방문자 관점에서 군집화 함으로써, 고객 관점에서 의미가 있는 상위 수준의 새로운 테마를 발굴하기 위한 방법론을 제안하였다. 연구의 특징은 유사주제 중심의 군집화라는 기존 연구와는 달리 사용자 관점의 관심주제 중심 군집화라 할 수 있다. 향후 사용자 중심의 카테고리 설계를 비롯한 새로운 관점의 고객군 정의 등 보다 높은 차원의 마케팅 전략 수립에 활용이 가능할 것으로 기대된다. 사용자 관점의 이슈 군집화 과정은 크롤링, 토픽 분석, 액세스 패턴 분석, 네트워크 병합, 네트워크 변환 및 군집화와 같은 여섯 가지 주요단계로 구성되어있다. 이를 위해 텍스트 마이닝과 소셜 네트워크 분석 기법을 활용한 비정형 텍스트를 기반으로한 빅데이터의 활용 방법을 모색하였다. 제안 방법론의 실무 적용 가능성을 평가하기 위해, 국내 최대 포털 뉴스 사이트의 방문자 2,177명의 1년간 방문 기록과 뉴스기사 대한 분석을 수행하고 그 결과를 요약하여 제시하였다.

Keywords

References

  1. Albright, R., Taming Text with the SVD, SAS Institute Inc., 2006.
  2. Cho, I. and N. Kim, "Recommending Core and Connecting Keywords of Research Area Using Social Network and Data Mining Techniques," Journal of Intelligence and Information Systems, Vol.17(2011), 127-138.
  3. Choi, C., "Research on Informal Organizational Network: Social Network Analysis," Korea Society and Public Administration, Vol.17, No.1(2006), 1-23.
  4. Choi, K., "Social Big Data Analysis," Proceedings of the Spring Workshop on Korea Intelligent Information System Society, (2012).
  5. Fan, W., W. Wallace, S. Rich, and Z. Zhang, "Tapping the Power of Text Mining," Communications of the ACM, Vol. 49, No. 9(2006), 76-82.
  6. Hong, S., Social Network World and Big Data Applications, Powerbook, Seoul, 2013.
  7. Hyun, Y., H. Han, H. Choi, J. Park, K, Lee, K-Y. Kwahk, and N. Kim, "Methodology Using Text Analysis for Packaging R&D Information Services on Pending National Issues," Journal Of Information Technology Applications & Management, Vol.20(2013), 231-257.
  8. Kang, M., and Y. S. Hau, "Multi-level Analysis of the Antecedents of Knowledge Transfer:Integration of Social Capital Theory and Social Network Theory," Asia Pacific Journal of Information Systems, Vol.22(2012), 75-97.
  9. Kauffiman, S. A., The Origins of Order, Oxford University Press, Oxford, 1993.
  10. Kim, I., "The Value of Big Data and Strategy," 2012 Big Data Search Analysis Technology, Insight, 2012.
  11. Kim, Y. H., Social Network Analysis, Seoul, 2007.
  12. Kwak, K. Y., Social Network Analysis, Cheongram, Seoul, 2014.
  13. Liu, B., Sentiment Analysis and Opinion Mining, Morgan & Claypool Publishers, 2012.
  14. Myung, J., D. Lee, and S. Lee., "A Korean Product Review Analysis System Using a Semi-Automatically Constructed Semantic Dictionary," Journal of KIISE : Software and Applications, Vol.35(2008), 392-403.
  15. Sebastiani, F., Classification of Text, Automatic, the Encyclopedia of Language and Linguistics 14, 2nd Edition, Elsevier Science Pub, 2006.
  16. Stanvrianou, A., P. Andritsos, and N. Nicoloyannis, "Overview and Semantic Issues of Text Mining," ACM SIGMOD Record, Vol. 36(2007), 23-24. https://doi.org/10.1145/1324185.1324190
  17. Witten, I, H., Text Mining, Practical Handbook of Internet Computing, CRC Press, 2004.
  18. Yoon, S., "A Study of Churn Prediction Model for Department Store Customers Using Data Mining Technique," Asia Marketing Journal, Vol.6, No.4(2005), 45-72.

Cited by

  1. User Perspective Website Clustering for Site Portfolio Construction vol.16, pp.3, 2015, https://doi.org/10.7472/jksii.2015.16.3.59
  2. Crowdsourcing based Scientific Issue Tracking with Topic Analysis 2017, https://doi.org/10.1016/j.asoc.2017.09.028