DOI QR코드

DOI QR Code

Efficient Dynamic Weighted Frequent Pattern Mining by using a Prefix-Tree

Prefix-트리를 이용한 동적 가중치 빈발 패턴 탐색 기법

  • 정병수 (경희대학교 컴퓨터공학과) ;
  • Received : 2010.07.02
  • Accepted : 2010.08.05
  • Published : 2010.08.31

Abstract

Traditional frequent pattern mining considers equal profit/weight value of every item. Weighted Frequent Pattern (WFP) mining becomes an important research issue in data mining and knowledge discovery by considering different weights for different items. Existing algorithms in this area are based on fixed weight. But in our real world scenarios the price/weight/importance of a pattern may vary frequently due to some unavoidable situations. Tracking these dynamic changes is very necessary in different application area such as retail market basket data analysis and web click stream management. In this paper, we propose a novel concept of dynamic weight and an algorithm DWFPM (dynamic weighted frequent pattern mining). Our algorithm can handle the situation where price/weight of a pattern may vary dynamically. It scans the database exactly once and also eligible for real time data processing. To our knowledge, this is the first research work to mine weighted frequent patterns using dynamic weights. Extensive performance analyses show that our algorithm is very efficient and scalable for WFP mining using dynamic weights.

지금까지의 빈발 패턴(Frequent Pattern) 마이닝에서는 각 항목들의 중요도(Weight)는 모든 같은 값으로 다루어 왔으나 실 환경에서는 각 항목들의 중요도가 다르게 적용되는 경우가 많이 있고 또 같은 항목이라도 시간에 따라 다른 중요도 값으로 다루어져야 할 경우가 있다. 비즈니스 데이터 분석 환경이나 웹 클릭 데이터 분석 환경과 같은 응용에서도 동적으로 변하는 중요도를 고려하여야 한다. 지금까지 항목의 중요도를 고려하는 여러 패턴 마이닝 기법들이 제안되고 있으나 동적으로 변하는 항목의 중요도를 고려하는 연구는 발표되지 않고 있다. 본 논문에서는 처음으로 동적인 항목들의 중요도(혹은 가중치)를 고려하는 빈발 패턴 마이닝 알고리즘을 제안한다. 제안하는 기법은 단 한번의 데이터베이스 스캔으로 처리되므로 스트림 데이터를 분석할 수 있다. 여러 실험을 통하여 제안하는 기법은 매우 효과적이며 확장성이 좋은 것임을 보인다.

Keywords

References

  1. R. Agrawal, T. Imieliński and A. Swami, “Mining association rules between sets of items in large databases,” Proc. of the 12th ACM SIGMOD Int. Conf. on Management of Data, May 1993, pp.207-216. https://doi.org/10.1145/170035.170072
  2. R. Agrawal, R. Srikant, “Fast Algorithms for Mining Association Rules in Large Databases,” Proc. of the 20th Int. Conf. on Very Large Data Bases, Sep., 1994, pp.487-499.
  3. U. Yun, J.J. Leggett, “WFIM: weighted frequent itemset mining with a weight range and a minimum weight,” Proc. of the Fourth SIAM Int. Conf. on Data Mining, USA, 2005, pp.636-640.
  4. U. Yun, “Efficient Mining of weighted interesting patterns with a strong weight and/or support affinity,” Information Sciences, vol. 177, 2007, pp.3477-3499. https://doi.org/10.1016/j.ins.2007.03.018
  5. C.H. Cai, A.W. Fu, C.H. Cheng, W.W. Kwong, “Mining association rules with weighted items,” Proc. of Int. Database Engineering and Applications Symposium, IDEAS 98, Cardiff, Wales, UK, 1998, pp. 68-77. https://doi.org/10.1109/IDEAS.1998.694360
  6. C.F. Ahmed, S.K. Tanbeer, B.-S. Jeong and Y.-K. Lee, “Mining Weighted Frequent Patterns in Incremental Databases”, Proc. of the 10th Pacific Rim Int. Conf. on Artificial Intelligence, Dec. 2008, pp.933-938.
  7. F. Tao, “Weighted association rule mining using weighted support and significant framework,” Proc. of the 9th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, USA, 2003, pp. 661-666. https://doi.org/10.1145/956750.956836
  8. W. Wang, J. Yang, P.S. Yu, “WAR: weighted association rules for item intensities,” Knowledge Information and Systems, Vol.6, 2004, pp.203-229. https://doi.org/10.1007/s10115-003-0108-7
  9. J. Han, J. Pei, Y. Yin, R. Mao, “Mining frequent patterns without candidate generation: a frequent-pattern tree approach,” Data Mining and Knowledge Discovery, Vol.8, 2004, pp. 53-87. https://doi.org/10.1023/B:DAMI.0000005258.31418.83
  10. G. Grahne, and J. Zhu, “Fast Algorithms for frequent itemset mining using FP-Trees,” IEEE Transactions on Knowledge and Data Engineering, Vol.17, No.10, Oct., 2005, pp.1347-1362. https://doi.org/10.1109/TKDE.2005.166
  11. J. Han, H. Cheng, D. Xin, X. Yan, “Frequent pattern mining: current status and future directions,” Data Mining and Knowledge Discovery, Vol.15, 2007, pp.55-86. https://doi.org/10.1007/s10618-006-0059-1
  12. A. Metwally, D. Agrawal, A. E.Abbadi, “An Integrated Efficient Solution for Computing Frequent and Top-k Elements in Data Streams,” ACM Transactions on Database Systems (TODS), Vol.31, No.3, 2006, pp.1095-1133. https://doi.org/10.1145/1166074.1166084
  13. N. Jiang and L. Gruenwald, “Research Issues in Data Stream Association Rule Mining,” SIGMOD Record, Vol. 35, No. 1, Mar., 2006, pp.14-19. https://doi.org/10.1145/1121995.1121998
  14. C. K. -S. Leung, Q. I. Khan, “DSTree: A Tree structure for the mining of frequent Sets from Data Streams,” Proc. of the 6th Int. Conf. on Data Mining (ICDM’06), 2006, pp.928-932. https://doi.org/10.1109/ICDM.2006.62
  15. J.-L. Koh, S.-F. Shieh, “An efficient approach for maintaining association rules based on adjusting FP-tree structures,” Proc. of the DASFAA’04, 2004, pp.417-424.
  16. C. K.-S. Leung Q.I. Khan, Z. Li and T. Hoque, “CanTree: a canonical-order tree for incremental frequent-pattern mining,” Knowledge and Information Systems, Vol.11, No.3, 2007, pp.287-311. https://doi.org/10.1007/s10115-006-0032-8
  17. U. Yun, “Mining lossless closed frequent patterns with weight constraints,” Knowledge-Based Systems, Vol.210, 2007, pp.86-97.
  18. S.K. Tanbeer, C.F. Ahmed, B.-S. Jeong and Y.-K. Lee, “CP-tree: A tree structure for single pass frequent pattern mining,” Proc. of the 12th Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD’08), 2008. https://doi.org/10.1007/978-3-540-68125-0_108

Cited by

  1. A Sequential Pattern Mining based on Dynamic Weight in Data Stream vol.2, pp.2, 2013, https://doi.org/10.3745/KTSDE.2013.2.2.137