Advanced SearchSearch Tips
An Efficient Clustering Method based on Multi Centroid Set using MapReduce
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
An Efficient Clustering Method based on Multi Centroid Set using MapReduce
Kang, Sungmin; Lee, Seokjoo; Min, Jun-ki;
As the size of data increases, it becomes important to identify properties by analyzing big data. In this paper, we propose a k-Means based efficient clustering technique, called MCSKMeans (Multi centroid set k-Means), using distributed parallel processing framework MapReduce. A problem with the k-Means algorithm is that the accuracy of clustering depends on initial centroids created randomly. To alleviate this problem, the MCSK-Means algorithm reduces the dependency of initial centroids using sets consisting of k centroids. In addition, we apply the agglomerative hierarchical clustering technique for creating k centroids from centroids in m centroid sets which are the results of the clustering phase. In this paper, we implemented our MCSK-Means based on the MapReduce framework for processing big data efficiently.
data mining;MapReduce;k-Means algorithm;clustering;big data;
 Cited by
토너먼트 기반의 빅데이터 분석 알고리즘,이현진;

디지털콘텐츠학회 논문지, 2015. vol.16. 4, pp.545-553 crossref(new window)
An Algorithms for Tournament-based Big Data Analysis, Journal of Digital Contents Society, 2015, 16, 4, 545  crossref(new windwow)
R. S. Michalski, R. E. Stepp, and E. Diday, "A recent advance in data analysis: Clustering objects into classes characterized by conjunctive concepts," Progress in Pattern Recognition, Vol. 1, pp. 33-56, 1981.

P. Domingos, and G. Hulten, "A general method for scaling up machine learning algorithms and its application to clustering," Proc. of the 18th International Conference on Machine Learning, pp. 106-113, 2001.

C. M. Bishop, Pattern recognition and machine learning, Vol. 1, New York: springer, 2006.

J. M. Jolion, P. Meer, and S. Bataouche, "Robust clustering with applications in computer vision," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 13, No. 8, pp. 791-802, 1991. crossref(new window)

T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu, "An efficient k-means clustering algorithm: Analysis and implementation," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, No. 7, pp. 881-892, 2002. crossref(new window)

J. Dean, and S. Ghemawat, "MapReduce: simplified data processing on large clusters," Communications of the ACM, Vol. 51, No. 1, pp. 107-113, 2008.

D. Pelleg, and A. W. Moore, "X-means: Extending K-means with Efficient Estimation of the Number of Clusters," Proc. of the 17th International Conference on Machine Learning, pp. 727-734, 2000.

G. Hamerly, and C. Elkan, "Learning the k in Kmeans," Advances in neural information processing systems 16, pp. 281, 2004.

P. Tan, M. Steinbach, and V. Kumar, Introduction to data mining Vol. 1, Boston: Pearson Addison Wesley, 2006.

Y. He, H. Tan, W. Luo, H. Mao, D. Ma, S. Feng, and J. Fan, "Mr-dbscan: An efficient parallel densitybased clustering algorithm using MapReduce," Proc. of the 17th IEEE International Conference on Parallel and Distributed Systems, pp. 473-480, 2011.