DOI QR코드

DOI QR Code

A Method for Frequent Itemsets Mining from Data Stream

데이터 스트림 환경에서 효율적인 빈발 항목 집합 탐사 기법

  • 서복일 (전남대학교 전자컴퓨터공학부) ;
  • 김재인 (전남대학교 전자컴퓨터공학부) ;
  • 황부현 (전남대학교 전자컴퓨터공학부)
  • Received : 2011.10.14
  • Accepted : 2011.12.20
  • Published : 2012.04.30

Abstract

Data Mining is widely used to discover knowledge in many fields. Although there are many methods to discover association rule, most of them are based on frequency-based approaches. Therefore it is not appropriate for stream environment. Because the stream environment has a property that event data are generated continuously. it is expensive to store all data. In this paper, we propose a new method to discover association rules based on stream environment. Our new method is using a variable window for extracting data items. Variable windows have variable size according to the gap of same target event. Our method extracts data using COBJ(Count object) calculation method. FPMDSTN(Frequent pattern Mining over Data Stream using Terminal Node) discovers association rules from the extracted data items. Through experiment, our method is more efficient to apply stream environment than conventional methods.

데이터 마이닝은 다양한 분야에서 축적된 데이터로부터 필요한 지식을 탐사하기 위하여 널리 이용되고 있다. 연관규칙을 탐사하기 위하여 이벤트의 빈발 횟수에 기반을 둔 많은 방법들이 존재하지만, 이들은 이벤트가 연속적으로 발생하는 스트림 환경에는 적합하지 않다. 또한 실시간으로 연관규칙을 탐사해야 하는 스트림 환경에 적용하기에는 많은 비용이 든다. 이 논문에서는 스트림 환경에서 연관규칙을 탐사하기 위한 새로운 방법을 제안한다. 제안하는 방법은 데이터 스트림에서 목적 이벤트의 발생 간격에 따른 가변 윈도우로부터 이벤트의 존재 유무에 근거한 COBJ(Count object) 계산법을 이용하여 데이터 항목을 추출한다. 추출된 데이터는 FPMDSTN(Frequent Pattern Mining over Data Stream using Terminal Node) 알고리즘을 통해 실시간으로 연관규칙을 탐사한다. 실험 결과를 통해 제안하는 방법이 기존의 방법에 비해 스트림 환경에 효율적임을 보인다.

Keywords

References

  1. J. Han, J. Pei, Y. Yin, and R. Mao, "Mining frequent patterns without candidate generation: a frequent pattern tree approach.", Data Mining and Knowledge Discovery, Vol.8, pp.53-87, 2004. https://doi.org/10.1023/B:DAMI.0000005258.31418.83
  2. R. Agrawal, T. Imielinski, and A. Swami, "Mining association rules between sets of items in large databases", proc. 12th ACM SIGMOD Int'l Conf. on Management of Data, pp.207-216, 1993.
  3. R. Agrawal, R. Srikant, "Fast algorithms for mining association rules." , The VLDB Conference, Santiago, Chile, 1994. Sep.
  4. R. Agrawal, R. Srikant, "Mining Sequential Patterns.", Proc. of 11th International Conference on Data Engineering, ICDE, pp.3-14, 1995.
  5. Minos N. Garofalakis, Rajeev Rastogi, and Kyuseok Shim, "SPRIT: Sequential Pattern Mining with Regular Expression Constraints.", Proceedings of the 25th VLDB Conference, Edinburgh, Scotland, pp.223-234, 1999.
  6. K.Y.Huang, C.H.Chang, "SMCA: A General Model for Mining Asynchronous Periodic Patterns in Temporal Databases.", IEEE Transactions on Knowledge and Data Engineering, Vol.17, No.6, 2005, June.
  7. S.Laxman, P.S.Sastry, and K.Unnikrishnan, "Discovering frequent generalized episodes when events persist for different durations.", IEEE Transactions on Knowledge and Data Engineering Vol.19, 2007, Sep.
  8. H. Mannila, H. Toivonen and A. I. Verkamo, "Discovering frequent episodes in sequences.", proc. of International Conference on Knowledge Discovery and Data Mining(KDD-95), 1995.
  9. W.Pijls and R.Potharst, "Classification and target group selection based upon frequent patterns.", Proc. of Twelfth Belgium-netherlands artificial intelligence Conference (BNAIC00), pp.125-132, 2000.
  10. H.Jin, J.Chen, H.He, C.Kelman, D.McAullay, and C.M.O'Keefe, "Signaling Potential Adverse Drug Reactions from Administrative Health Databases.", IEEE Transactions on Knowledge and Data Engineering, Vol.22, No.6, 2010, August.
  11. N. Srinivasa, Q. Jiang, and L. G. Barajas, "High-Impact Event Prediction by Temporal Data Mining Through Genetic Algorithms.", proc. of 4th IEEE International Conference on Natual Computation, 2008.
  12. C. F. Ahmed, S. K. Tanbeer, and B.S. Jeong, "Efficient Mining of Weighted Frequent Patterns Over Data Streams", 11th IEEE International Conference on High Performance Computing and Communications", 2009.
  13. G. Chen, X. Wu, and X. Zhu, "Mining Sequential Patterns Across Data Streams," Univ. of Vermont Computer Science Technical Report(CS-05-04), 2005(3).
  14. J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M.Hsu, "Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach," IEEE Transactions on Knowledge and Data Engineering, Vol.16, No.11, 2004(11).
  15. C. K. S. Leung, B. Hao, "Mining of Frequent Itemsets from Streams of Uncertain Data", IEEE International Conference on Data Engineering, 2009.
  16. Pang-Ning Tan, Michael Steinbach, Vipin Kumar 저, 용환승, 나연묵, 박종수 역, "데이터 마이닝".

Cited by

  1. Context Inference and Sensor Data Classification of Big Data Stream Environment vol.9, pp.10, 2014, https://doi.org/10.13067/JKIECS.2014.9.10.1079