A Study on Efficient Cluster Analysis of Bio-Data Using MapReduce Framework

  • Yoo, Sowol (Department of Computer Science & Statistics, Chosun University) ;
  • Lee, Kwangok (Department of Computer Science & Statistics, Chosun University) ;
  • Bae, Sanghyun (Department of Computer Science & Statistics, Chosun University)
  • Received : 2014.02.18
  • Accepted : 2014.03.25
  • Published : 2014.03.30


This study measured the stream data from the several sensors, and stores the database in MapReduce framework environment, and it aims to design system with the small performance and cluster analysis error rate through the KMSVM algorithm. Through the KM-SVM algorithm, the cluster analysis effective data was used for U-health system. In the results of experiment by using 2003 data sets obtained from 52 test subjects, the k-NN algorithm showed 79.29% cluster analysis accuracy, K-means algorithm showed 87.15 cluster analysis accuracy, and SVM algorithm showed 83.72%, KM-SVM showed 90.72%. As a result, the process speed and cluster analysis effective ratio of KM-SVM algorithm was better.



  1. S.-D. Oh, "U-health system for efficient processing of multi-dimensional biological data stream", M.S. Thesis, Chosun University, 2010.
  2. S.-H. Park, "Stream data splitting and allocation techniques for distributed parallel processing of real-time stream data", M.S. Thesis. Pusan National University, 2013.
  3. S.-S. Yeo, H.-G. Yun, and S.-K. Kim, "For intellectual property protection of digital contents of anonymous fingerprinting research trends", Korea Institute of Information Security & Cryptology, Vol. 11, pp. 90-99, 2001.