JOURNAL BROWSE
Search
Advanced SearchSearch Tips
An Efficient Clustering Method based on Multi Centroid Set using MapReduce
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
An Efficient Clustering Method based on Multi Centroid Set using MapReduce
Kang, Sungmin; Lee, Seokjoo; Min, Jun-ki;
 
 Abstract
As the size of data increases, it becomes important to identify properties by analyzing big data. In this paper, we propose a k-Means based efficient clustering technique, called MCSKMeans (Multi centroid set k-Means), using distributed parallel processing framework MapReduce. A problem with the k-Means algorithm is that the accuracy of clustering depends on initial centroids created randomly. To alleviate this problem, the MCSK-Means algorithm reduces the dependency of initial centroids using sets consisting of k centroids. In addition, we apply the agglomerative hierarchical clustering technique for creating k centroids from centroids in m centroid sets which are the results of the clustering phase. In this paper, we implemented our MCSK-Means based on the MapReduce framework for processing big data efficiently.
 Keywords
data mining;MapReduce;k-Means algorithm;clustering;big data;
 Language
Korean
 Cited by
1.
토너먼트 기반의 빅데이터 분석 알고리즘,이현진;

디지털콘텐츠학회 논문지, 2015. vol.16. 4, pp.545-553 crossref(new window)
1.
An Algorithms for Tournament-based Big Data Analysis, Journal of Digital Contents Society, 2015, 16, 4, 545  crossref(new windwow)
 References
1.
R. S. Michalski, R. E. Stepp, and E. Diday, "A recent advance in data analysis: Clustering objects into classes characterized by conjunctive concepts," Progress in Pattern Recognition, Vol. 1, pp. 33-56, 1981.

2.
P. Domingos, and G. Hulten, "A general method for scaling up machine learning algorithms and its application to clustering," Proc. of the 18th International Conference on Machine Learning, pp. 106-113, 2001.

3.
C. M. Bishop, Pattern recognition and machine learning, Vol. 1, New York: springer, 2006.

4.
J. M. Jolion, P. Meer, and S. Bataouche, "Robust clustering with applications in computer vision," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 13, No. 8, pp. 791-802, 1991. crossref(new window)

5.
T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu, "An efficient k-means clustering algorithm: Analysis and implementation," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, No. 7, pp. 881-892, 2002. crossref(new window)

6.
J. Dean, and S. Ghemawat, "MapReduce: simplified data processing on large clusters," Communications of the ACM, Vol. 51, No. 1, pp. 107-113, 2008.

7.
D. Pelleg, and A. W. Moore, "X-means: Extending K-means with Efficient Estimation of the Number of Clusters," Proc. of the 17th International Conference on Machine Learning, pp. 727-734, 2000.

8.
G. Hamerly, and C. Elkan, "Learning the k in Kmeans," Advances in neural information processing systems 16, pp. 281, 2004.

9.
P. Tan, M. Steinbach, and V. Kumar, Introduction to data mining Vol. 1, Boston: Pearson Addison Wesley, 2006.

10.
Y. He, H. Tan, W. Luo, H. Mao, D. Ma, S. Feng, and J. Fan, "Mr-dbscan: An efficient parallel densitybased clustering algorithm using MapReduce," Proc. of the 17th IEEE International Conference on Parallel and Distributed Systems, pp. 473-480, 2011.