Analysis of Web Log Using Clementine Data Mining Solution

클레멘타인 데이터마이닝 솔루션을 이용한 웹 로그 분석

  • 김재경 (경희대학교 경영대학) ;
  • 이건창 (성균관대학교 경영학부) ;
  • 정남호 (성균관대학교 경영학부) ;
  • 권순재 (성균관대학교 경영학부) ;
  • 조윤호 (동양공업전문대학 인터넷정보과)
  • Published : 2002.11.30

Abstract

Since mid 90's, most of firms utilizing web as a communication vehicle with customers are keenly interested in web log file which contains a lot of trails customers left on the web, such as IP address, reference address, cookie file, duration time, etc. Therefore, an appropriate analysis of the web log file leads to understanding customer's behaviors on the web. Its analysis results can be used as an effective marketing information for locating potential target customers. In this study, we introduced a web mining technique using Clementine of SPSS, and analyzed a set of real web log data file on a certain Internet hub site. We also suggested a process of various strategies build-up based on the web mining results.

1990년대 중반 이후 기업들은 인터넷상에서 사용자의 행동에 대한 관심이 높아짐에 따라, 인터넷상에서 사용자의 웹 사이트 클릭 정보가 남아 있는 웹 로그파일에 대한 관심 역시 높아지고 있다. 웹 로그파일에는 사용자 IP, 사용시간, 방문한 주소, 참조주소, 쿠키 파일 등 다양한 정보가 남기 때문에 이것을 이용하면 사용자의 웹 사이트 행위를 구체적으로 분석할 수 있다. 또한, 특정한 유형의 사용자와 관련된 웹 사이트를 찾아 효과적인 마케팅 전략을 수립할 수도 있다. 본 연구에서는 SPSS사의 데이터마이닝 도구인 클레멘타인을 이용하여 웹 마이닝을 할 수 있는 방법론을 소개하고, 실제 인터넷 허브 사이트의 로그화일을 대상으로 분석을 수행하였다.

Keywords

References

  1. 박종수, '대용량 데이터베이스상의 효과적인 연관 규칙 탐사를 위한 전지기법,' 한국정보과학회 데이터베이스 연구회지, 12(4), 1996, pp. 59-75
  2. 박종수, 유원경, 홍기형, '연관규칙 탐사와 그 응용,' 정보과학회지, 16(9), 1998, pp. 37-44
  3. Agrawal et al., 'Programs Generating Test Data in Data Mining,' 1997, http://www.almaden.ibm.com/cs/quest
  4. Agrawal R. and R. Srikant, Fast Algorithms for Mining Association Rules in Lage Dat abases, Proceedings of the 20th Inter national Conference on Very Large Databases, 1994
  5. Agrawal R., Imielinski T., and A. Swami, 'Database Mining: A Performance Perspective,' IEEE Transactions on Knowledge and Data Engineering, 5(6), 1993 pp. 914-925
  6. Agrawal, R. and R. Srikant, 'Mining Sequential patterns,' Research report RJ 910, IBM Almaden Research Center, 1994
  7. Agrawal, R., Imielinski, T. and A. Swami, 'Database Mining: A Performance Perspective,' IEEE Transactions on Knowledge and Data Engineering, 5(6), 1993-a, pp. 914-925
  8. Agrawal, R., Imielinski T., and A. Swami, 'Mining association rules between sets of items in large databases,' In Proc. of the ACM SIGMOD Conference on Management of Data, 1993-b, pp. 207-216, Washington, D.C.
  9. Borges, J. and M. Levene, 'Mining Navigation Pattern Discovery from Internet Data,' 1999, http://www.almaden.ibm.com/cs/quest
  10. Charu, C. Aggarwal and P. S. Yu, 'Mining large itemsets for association rules,' Bulletin of the IEEE Computer Society Technical Comittee on Data Engineering, 1998, pp. 23-31
  11. Chen M. S., Han J., and Yu P. S, 'Data Mining: An Overview from a Database Perspective,' IEEE Transactions on Knowledge and Data Engineering, 8(6), 1996, pp. 866-883
  12. Chen, M. S., Park, J. S. and P. S. Yu, 'Efficient Data Mining for Path Traversal Patterns,' IEEE Transactions on Knowledge and Data Engineering, 10 (2), 1998, pp. 209-221
  13. Coffman, Jr. E. G. and J. Eve, 'File Structure Using Hashing Functions,' CACM, 13(7), 1970, pp. 427-436
  14. Cooley, et al., 'Web Mining: Information and Pattern Discovery on the World Wide Web,' ICTAI97, 1997
  15. Cooley, R., Tan, P. N. and J. Srivastava, 'Discovery of interesting usage patterns from web data,' Technical Report TR 99-022. University of Minnesota, 1999
  16. Fu, Y., Sandhu, K. and M. Y. Shih, 'Clustering of Web Users Basd on Access Patterns,' Proc. of the ACM SIGKDD international conference on Knowledge discovery and data mining, 1999, pp. 95-104
  17. Igor V. C., Scott G. and P. Smyth, 'A general probabilistic framework for clustering individuals and objects,' Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, 2000, pp. 140-149
  18. Lawrence, S. and C. L. Giles, 'Efficient identification of Web communities Gary William Flake,' Proc. of the ACM SIGKDD international conference on Knowledge discovery and data mining, 2000, pp. 150-160
  19. Mannila, H. and C. Meek, 'Global partial orders from sequential data,' Proc. of the ACM SIGKDD international conference on Knowledge discovery and data mining, 2000, pp. 161-168
  20. Mannila, H., Toivonen, H. and I. Verkamo, 'Discovery of frequent episodes in event sequences,' Department of Computer Science Series of Publications Report C-1997-15, 1997
  21. McCallum, A., Nigam, K. and L. H. Ungar, 'Efficient clustering of high-dimensional data sets with application to reference matching,' Proc. of the ACM SIGKDD international conference on Knowledge discovery and data mining, 2000, pp. 169-178
  22. Mobasher, B., Cooley, R. and J. Srivastava, 'Automatic Personalization Based on Web Usage Milling,' Proc. of the ACM SIGKDD international conference on Knowledge discovery and data mining, 1999, pp. 123-132
  23. Park, J. S., Chen, M. S. and P. S. Yu, 'An Effective Hash-Based Algorithm for Mining Association Rules,' Proceedings of ACM SIGMOD, 1995, pp. 175-186
  24. Savasere A., Omiecinski E., and S. Navathe, 'An Efficient Algorithm for Mining Ass ociation Rules in Large Databases,' Proceedings of the 21th International Conference on Very Large Databases, 1995, pp. 432-444
  25. Spiliopoulou, M., Pohle, C. and L. C. Faulstich, 'Improving the Effectiveness of a Web Site with Web Usage Mining,' 2000, http://www.wiwi.hu-berlin.de
  26. Srikant R. and R. Agrawal, 'Mining Generalized Association Rules,' Proceedings of the 21th International Conference on Very Large Databases, 1995, pp. 407-419
  27. Srikant R. and R. Agrawal, 'Mining Sequential Patterns: Generalizations and Performance Improvements,' Proceedings of the Fifth Int'l Conference on Extending Database Technology (EDBT), Avignon, France, 1996
  28. Srivastava, J., Cooley, R., Deshpande, M. and P. N. Tan, 'Web Usage Mining: Discovery and Applications of Usge Paterns from Web Data,' Proc. of the ACM SIGKDD international conference on Knowledge discovery and data mining, 1999, pp. 1-10
  29. Yang, J., Wang, W. and P. S. Yu, 'Mining asynchronous periodic patterns in time series data,' Proc. of the ACM SIGKDD international conference on Knowledge discovery and data mining, 2000, pp. 280-284