A Study on Selection of Split Variable in Constructing Classification Tree

의사결정나무에서 분리 변수 선택에 관한 연구

  • 정성석 (전북대학교 통계정보과학과) ;
  • 김순영 (전북대학교 통계정보과학) ;
  • 임한필 (전북대학교 통계정보과학과)
  • Published : 2004.07.01


It is very important to select a split variable in constructing the classification tree. The efficiency of a classification tree algorithm can be evaluated by the variable selection bias and the variable selection power. The C4.5 has largely biased variable selection due to the influence of many distinct values in variable selection and the QUEST has low variable selection power when a continuous predictor variable doesn't deviate from normal distribution. In this thesis, we propose the SRT algorithm which overcomes the drawback of the C4.5 and the QUEST. Simulations were performed to compare the SRT with the C4.5 and the QUEST. As a result, the SRT is characterized with low biased variable selection and robust variable selection power.


  1. 서울대학교 박사학위논문 A study on bias problems in constructing classification trees 이윤모
  2. Classification and Regression Trees Breiman, L.;Friedman, J. H.;Olshen, R. A.;Stone, C. J.
  3. Applied Statistics v.29 An Exploratory technique for investigating large quantities of categorical data Kass, G. V.
  4. Ph.D. Thesis, University of Wisconsin Multiway Split Classification Trees Kim, H.
  5. Journal of the American Statistical Association v.96 Classification trees with unbiased multiway splits Kim, H.;Loh, W. Y.
  6. Statistica Sinica v.7 Split selection method for classification trees Loh, W. Y.;Shih, Y. S.;
  7. Journal of the American Statistical Association v.83 Tree-structured classification via generalized discriminant analysis (with discussion) Loh, W. Y.;Vanichsetakul, N.
  8. C4.5 : Programs for Machine Learning Quinlan, J. R.
  9. Journal of Artificial Intelligence Research v.4 Improved use of continuous attribute in C4.5 Quinlan, J. R.
  10. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations Witten, I. H.;Frank, E.