DOI QR코드

DOI QR Code

Adaptive Frequent Pattern Algorithm using CAWFP-Tree based on RHadoop Platform

RHadoop 플랫폼기반 CAWFP-Tree를 이용한 적응 빈발 패턴 알고리즘

  • Park, In-Kyu (Dept. of Game Software, College of Engineering Joongbu University)
  • 박인규 (중부대학교 게임소프트웨어학과)
  • Received : 2017.05.01
  • Accepted : 2017.06.20
  • Published : 2017.06.28

Abstract

An efficient frequent pattern algorithm is essential for mining association rules as well as many other mining tasks for convergence with its application spread over a very broad spectrum. Models for mining pattern have been proposed using a FP-tree for storing compressed information about frequent patterns. In this paper, we propose a centroid frequent pattern growth algorithm which we called "CAWFP-Growth" that enhances he FP-Growth algorithm by making the center of weights and frequencies for the itemsets. Because the conventional constraint of maximum weighted support is not necessary to maintain the downward closure property, it is more likely to reduce the search time and the information loss of the frequent patterns. The experimental results show that the proposed algorithm achieves better performance than other algorithms without scarifying the accuracy and increasing the processing time via the centroid of the items. The MapReduce framework model is provided to handle large amounts of data via a pseudo-distributed computing environment. In addition, the modeling of the proposed algorithm is required in the fully distributed mode.

효율적인 빈발 패턴 알고리즘은 연관 규칙 마이닝이나 융복합을 위한 마이닝 과정에서 필수적인 요소이며 많은 활용성을 가지고 있다. 패턴 마이닝을 위한 많은 모델들이 빈발 패턴에 관한 정보를 추출하여 FP-트리를 이용하여 저장하고 있다. 본 논문에서는 항목들의 무게중심을 이용한 새로운 빈발 패턴 알고리즘(CAWFP-Growth)을 제안하여 항목들이 가지는 가중치와 빈도수를 같이 고려하여 항목간의 중심을 계산하여 기존의 FP-Growth 알고리즘의 효율성을 향상시킨다. 제안한 방법은 하향 폐쇄의 성질을 유지하기 위한 기존의 전역적 최대치 가중치 지지도를 필요로 하지 않기 때문에 자연히 빈발 패턴의 탐색시간이 줄어들고 정보의 손실을 줄일 수 있다. 실험결과를 통하여 제안된 알고리즘이 기존의 동적가중치를 이용하는 다른 방법과 비교해볼 때, 항목들의 무게중심이 빈발패턴의 정확한 정보를 유지하고 FP-트리의 처리시간을 줄여주기 때문에 제안한 방법의 중요성을 보이고 있다 또한 가상 분산모드에서 맵리듀스 프레임을 기반으로 빅데이터를 모델링하고 향후 완전분산 모드에서 제안한 알고리즘의 모델링이 필요하다.

Keywords

References

  1. R. Agrawal, R. Srikant, "Fast Algorithm for Mining Association Rules", In: 20th Int. Conf. on Very Large Data Bases, pp. 487-499, 1994.
  2. C. F. Ahmed, S. K. Tanbeer, B. S. Jeong, "Mining Weighted Frequent Patterns using Adaptive Weightes", In: Fyfe et al. (Eds.): IDEAL 2008, LNCS 5326, pp. 258-265, 2008.
  3. C. H. Cai, A. W. C. Fu, C. H. Cheng, W. W. Kwong, "Mining Association rules with weighted items", In Proceedings of Intl. Database Engineering and Applications Symposium (IDEAS 1988), Cardiff, Wales, UK, July pp. 68-77, 1998.
  4. F. Tao, "Weighted association rule Mining using Weighted Support and Significant Framework", In: 9th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining", pp. 661-666, 2003.
  5. W. Wang, J. Yang, P. S. Yu, "WAR: Weighted Association Rules Item Intensities", Knowledge Information and Systems, No. 6, pp. 203-229, 2003.
  6. U. Yun, J. J. Leggett, "WFIM: Weighted Frequent Itemset Mining with a wieght range and a minimum weight", Society for Industrial and Applied Maathematics, Proceedings of the 2005 SIAM International Conference on Data Mining, pp. 636-640, 2005.
  7. U. Yun, "Efficient Mining of Weighted Interesting Patterns with A Strong Weight and/or Support Affinity", Information Sciences, Vol. 177, pp. 3477-3499, 2007. https://doi.org/10.1016/j.ins.2007.03.018
  8. U. Yun, "An Efficient Mining of Weighted Frequent Patterns with Length Decreasing Support Constraints", Knowlwdge-Based Systems, Vol. 21, Issue 8, Dec., pp. 741-752, 2008. https://doi.org/10.1016/j.knosys.2008.03.059
  9. S. Zhang, C. Zhang, X. Yan, "Post-Mining: Maintenance of Association Rules by Weighting", Information Systems, Vol. 23, pp. 691-707, 2003.
  10. J. E. Shin, B. H. Jeong, D. H. Lim, "BigData Distribution System using RHadoop", Society of Data Information Science, Vol. 36, No. 5, pp. 1155-1166, 2015.
  11. H. L. Nguyen, "An Efficient Algorithm for Mining Weighted Frequent Itemsets Using Adaptive Weights", I.J. Intellogent Systems and Appillcations, Vol. 11, pp. 41-48, 2015.
  12. K. U. Jeon, M. S. Kim "Frequent Pattern Mining Technique of BigData using MapReduce Framework", Korea Information Processing Society, Vol. 21, No. 3, pp.17-25, 2014..
  13. G. W. Jin, "A Study on the Data Collection Methods based Hadoop Distributed Environment", Korea Convergence Society, Vol. 7, No. 5, pp. 1-6, 2016.
  14. Y. J. Kim, "Convergence of Business Information System Process using Knowledge-based Method", Korea Convergence Society, Vol. 6, No. 4, pp. 65-71, 2015.
  15. J. H. Gu, "A Study on the Machine Learning Model for Product Faulty Prediction in Internet of Things Environment", Convergence Society for SMB, Vol. 7, No. 1, pp. 55-60, 2017.
  16. S. Y. Hong, "New Authentication Methods based on User's Behavior Big Data Analysis on Cloud", Convergence Society for SMB, Vol. 6, No. 4, pp. 31-36, 2016.
  17. I. K. Park, "An Improvement of the Decision Making of Categorical Data in Rough Set Analysis", Journal of Digital Convergence, Vol. 13, No. 6, pp.157-164, 2015. https://doi.org/10.14400/JDC.2015.13.6.157
  18. I. K. Park, "The Generation of Control Rules for Data Mining", Journal of Digital Convergence, Vol. 11, No. 11, pp.343-349, 2013. https://doi.org/10.14400/JDPM.2013.11.11.343
  19. I. K. Park, "Clustering Algorithm for Data Mining using Posterior Probability-based Information Entropy", Journal of Digital Convergence, Vol. 12, No. 12, pp.293-301, 2014. https://doi.org/10.14400/JDC.2014.12.12.293
  20. B. R. Hwang, S. G. Kim, "On Implementing a Learning Environment for Big Data Processing using Raspberry Pi", Journal of Digital Convergence, Vol. 14, No. 4, pp.251-258, 2016. https://doi.org/10.14400/JDC.2016.14.4.251
  21. Apache Hadoop, http://hadoop.apache.org/