- Volume 17 Issue 2
It is very important to select a split variable in constructing the classification tree. The efficiency of a classification tree algorithm can be evaluated by the variable selection bias and the variable selection power. The C4.5 has largely biased variable selection due to the influence of many distinct values in variable selection and the QUEST has low variable selection power when a continuous predictor variable doesn't deviate from normal distribution. In this thesis, we propose the SRT algorithm which overcomes the drawback of the C4.5 and the QUEST. Simulations were performed to compare the SRT with the C4.5 and the QUEST. As a result, the SRT is characterized with low biased variable selection and robust variable selection power.
- 서울대학교 박사학위논문 A study on bias problems in constructing classification trees 이윤모
- Classification and Regression Trees Breiman, L.;Friedman, J. H.;Olshen, R. A.;Stone, C. J.
- Applied Statistics v.29 An Exploratory technique for investigating large quantities of categorical data Kass, G. V. https://doi.org/10.2307/2986296
- Ph.D. Thesis, University of Wisconsin Multiway Split Classification Trees Kim, H.
- Journal of the American Statistical Association v.96 Classification trees with unbiased multiway splits Kim, H.;Loh, W. Y. https://doi.org/10.1198/016214501753168271
- Statistica Sinica v.7 Split selection method for classification trees Loh, W. Y.;Shih, Y. S.;
- Journal of the American Statistical Association v.83 Tree-structured classification via generalized discriminant analysis (with discussion) Loh, W. Y.;Vanichsetakul, N. https://doi.org/10.2307/2289295
- C4.5 : Programs for Machine Learning Quinlan, J. R.
- Journal of Artificial Intelligence Research v.4 Improved use of continuous attribute in C4.5 Quinlan, J. R.
- Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations Witten, I. H.;Frank, E.