DOI QR코드

DOI QR Code

An Algorithm for reducing the search time of Frequent Items

빈발 항목의 탐색 시간을 단축하기 위한 알고리즘

  • Received : 2010.09.01
  • Accepted : 2010.09.28
  • Published : 2011.01.31

Abstract

With the increasing utility of the recent information system, the methods to pick up necessary products rapidly by using a lot of data has been studied. Association rule search methods to find hidden patterns has been drawing much attention, and the Apriori algorithm is a major method. However, the Apriori algorithm increases search time due to its repeated scans. This paper proposes an algorithm to reduce searching time of frequent items. The proposed algorithm creates matrix using transaction database and search for frequent items using the mean number of items of transactions at matrix and a defined minimum support. The mean number of items of transactions is used to reduce the number of transactions, and the minimum support to cut down on items. The performance of the proposed algorithm is assessed by the comparison of search time and precision with existing algorithms. The findings from this study indicated that the proposed algorithm has been searched more quickly and efficiently when extracting final frequent items, compared to existing Apriori and Matrix algorithm.

최근 정보시스템의 활용도가 높아짐에 따라, 많은 데이터를 이용하여 필요한 상품을 빠르게 추출하는 방법들에 대한 연구가 활발히 이루어지고 있다. 숨겨진 패턴을 탐색하는 연관 규칙 탐색 기법들이 많은 관심을 받고 있으며, Apriroi 알고리즘은 대표적인 기법이다. 그러나 Apriori 알고리즘은 반복적인 스캔으로 인한 탐색시간 증가 문제를 가지고 있다. 본 논문에서는 빈발항목의 탐색시간을 단축하기 위한 알고리즘을 제안한다. 제안한 알고리즘은 트랜잭션 데이터베이스를 이용하여 매트릭스를 생성하고 매트릭스에서 트랜잭션들의 평균 항목 개수와 정의한 최소 지지도를 사용하여 빈발 항목을 탐색한다. 트랜잭션의 평균 항목 개수는 트랜잭션의 수를 줄이는데 사용되고 최소 지지도는 항목을 줄이는데 사용된다. 제안한 알고리즘의 성능 평가는 기존 알고리즘과의 탐색시간 비교와 정확도 비교로 이루어진다. 실험 결과는 제안한 알고리즘이 기존의 Apriori와 매트릭스 알고리즘보다 최종 빈발 항목의 추출에서 빠르고 효율적으로 탐색이 이루어지는 것을 확인하였다.

Keywords

References

  1. R. Agrawal, and R. Srikant, "Fast algorithms for mining association rules", In Proceeding of the 20th VLDB Conference, pp. 487-499, Santiago, Chile,1994.
  2. R. Agrawal, T. Imielinski, and A. Swami, "Mining association rules between sets of items in large database", Proceedings of the ACM SIGMOND Conference on Management of data, pp. 207-216, Washington. DC, 1993.
  3. Jian Pei, Pattern-Growth methods for frequent pattern mining, the degree of doctor of philosophy, Simon Fraser University, pp. 9-12, 2002.
  4. Mehmed Kantardzic, Data Mining : Concepts, Models, Methods and Algorithms, Wiley-IEEE Press, 2002
  5. Sergey Brin, Rajeev Motwani, Jeffrey D. Ulman, and Shalom Tsur., "Dynamic Itemset Counting and Implication Rules for Market Basket Data," In Proc. of ACM SIGMOD Conference on Management of Data (SIGMOD'97), pp. 255-264,1997.
  6. Jung Soo Park, Ming-Syan Chen, and Philip S.Yu., "An effective hash-based algorithm for mining association rules," In Proc. of ACM SIGMOD Conference on Management of Data(SIGMOD'95), pp. 175-186, San Jose, California, May 1995.
  7. Ashok Savasere, Edward Omiecinski, and Shamkant Navathe, "An effective algorithm for mining association rules in large databases," In Proc. of the 21st International Conference on Very Large Data Bases (VLDB'95), pp. 432 -444, Zurich, Swizerland, 1995.
  8. Hannu Toivonen, "Sampling Large Data-base for Asso -ciation rules," In Proc. of the 22nd International Conference on Very Large Data Bases (VLDB'96), Mumbai(Bombay), India, 1996.
  9. Feng WANG, Yong-hua LI, "Am Improved Apriori Algorithm based on the matrix", FBIE '08. International Seminar, pp. 152-155, Dec. 2008.
  10. M.G. Vozalis, K.G. Margaritis, Using SVD and demographic data for the enhancement of generalized Collaborative Filtering, Information Sciences 177, pp.3017-3037, 2007. https://doi.org/10.1016/j.ins.2007.02.036