Comparing Classification Accuracy of Ensemble and Clustering Algorithms Based on Taguchi Design

다구찌 디자인을 이용한 앙상블 및 군집분석 분류 성능 비교

  • Shin, Hyung-Won (Dept. of Computer Science Engineering, Yonsei University) ;
  • Sohn, So-Young (Dept. of Industrial Systems Engineering, Yonsei University)
  • 신형원 (연세대학교 컴퓨터 과학.산업시스템공학과) ;
  • 손소영 (연세대학교 컴퓨터 과학.산업시스템공학과)
  • Received : 20000400
  • Accepted : 20001200
  • Published : 2001.03.31

Abstract

In this paper, we compare the classification performances of both ensemble and clustering algorithms (Data Bagging, Variable Selection Bagging, Parameter Combining, Clustering) to logistic regression in consideration of various characteristics of input data. Four factors used to simulate the logistic model are (1) correlation among input variables (2) variance of observation (3) training data size and (4) input-output function. In view of the unknown relationship between input and output function, we use a Taguchi design to improve the practicality of our study results by letting it as a noise factor. Experimental study results indicate the following: When the level of the variance is medium, Bagging & Parameter Combining performs worse than Logistic Regression, Variable Selection Bagging and Clustering. However, classification performances of Logistic Regression, Variable Selection Bagging, Bagging and Clustering are not significantly different when the variance of input data is either small or large. When there is strong correlation in input variables, Variable Selection Bagging outperforms both Logistic Regression and Parameter combining. In general, Parameter Combining algorithm appears to be the worst at our disappointment.

Keywords