Clustering Algorithm using the DFP-Tree based on the MapReduce

맵리듀스 기반 DFP-Tree를 이용한 클러스터링 알고리즘

Seo, Young-Won;Kim, Chang-soo

  • Received : 2015.09.10
  • Accepted : 2015.11.10
  • Published : 2015.12.31


As BigData is issued, many applications that operate based on the results of data analysis have been developed, typically applications are products recommend service of e-commerce application service system, search service on the search engine service and friend list recommend system of social network service. In this paper, we suggests a decision frequent pattern tree that is combined the origin frequent pattern tree that is mining similar pattern to appear in the data set of the existing data mining techniques and decision tree based on the theory of computer science. The decision frequent pattern tree algorithm improves about problem of frequent pattern tree that have to make some a lot's pattern so it is to hard to analyze about data. We also proposes to model for a Mapredue framework that is a programming model to help to operate in distributed environment.


SData Mining;Frequent-Pattern tree;Clustering Algorithms;Distributed Processing System;Recommendation System;Map Reduce


  1. Y.Lim "IT's Evolution Scenario as Machine Learning"
  2. A.Das M.Datar and A.Garg "Google News Personalization: Scalable Online Collaborative Filtering" University of Illinois at Urbana Champaign
  3. S.Kim "A Accuracy of Deepspace's Picture Taging System are 97%"
  4. Wikipedia "Bigdata" ttps://
  5. Zoubin Ghahramani "Unsupervised Learning" ""
  6. Wikipedia "Apriori_algorithm"
  7. G.Lee and U.Yun, "Analysis and Performance Evaluation of Pattern Condensing Techniques used in Representative Pattern Mining" Journal of Internet Computing and Services, Vol.16, No.2, pp.77-83, 2015
  8. K.Lee, H,Namgoong, E.Kim, K.Lee and H.Kim "Analysis of multi-demensional interaction among SNS users" Journal of Korean Society for Internet Information, Vol.12, No.2, pp.113-121, 2011
  9. Jeffrey Dean and Sanjay Ghemawat "MapReduce: Simplified Data Processing on Large Clusters" Google,Inc.
  10. K.Shvachko, H.Kuang, S.Radia and R.Chansler "The Hadoop Distributed File System" Yahoo! Sunnyvale, California USA IEEE 978-1-4244-7153-9 2010
  11. D.Cho, K.Chung, K.Rim and J.Lee "Method of Associative Group Using FP-Tree in Personalized Recommendation System" Journal of Korea Contents Association Vol.7 No.10, pp.19-26, 2007
  12. B.Jeong and A.Farhan "Efficient Dynamic Weighted Frequent Pattern Mining by using a Prefix-Tree" Journal of Information Processing Systems D Vol.17-D No.4 pp.253-258 2010
  13. G.Lee, U.Yun, D.Kim, G.Ryang, J.Hwang, B.Yang and C.Jeong "Performence Evaluation and Analysis of Various Techniques on Graph Pattern Mining" Journal of Korean Society for Internet Information, Vol.16, No.1, pp.77-78, 2015
  14. E.Jeong and B.Lee "A strategy of emotional information classification for SNS using Support Vector Machine" ournal of Korean Society for Internet Information, Vol.16, No.1, pp.261-262, 2015
  15. Stanford SNAP Group
  16. Amazon Meta data represented by Stanford SNAP Group


Supported by : 부경대학교