Advanced SearchSearch Tips
Variable Selection with Regression Trees
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
Variable Selection with Regression Trees
Chang, Young-Jae;
  PDF(new window)
Many tree algorithms have been developed for regression problems. Although they are regarded as good algorithms, most of them suffer from loss of prediction accuracy when there are many noise variables. To handle this problem, we propose the multi-step GUIDE, which is a regression tree algorithm with a variable selection process. The multi-step GUIDE performs better than some of the well-known algorithms such as Random Forest and MARS. The results based on simulation study shows that the multi-step GUIDE outperforms other algorithms in terms of variable selection and prediction accuracy. It generally selects the important variables correctly with relatively few noise variables and eventually gives good prediction accuracy.
Regression tree;random forest;variable selection;bagging;
 Cited by
Multi-Step Classification Trees, Communications in Statistics - Simulation and Computation, 2012, 41, 9, 1728  crossref(new windwow)
Belsley, D. A. (1980). On the efficient computation of the nonlinear full-information maximum-likelihood estimator, Journal of Econometrics, 14, 203-225. crossref(new window)

Breiman, L. (2001). Random Forests, Machine Learning, 45, 5-32. crossref(new window)

Chattopadhyay, S. (2003). Divergence Between the Hicksian Welfare Measures: The Case of Revealed Preference for Public Amenities, Journal of Applied Econometrics, 17, 641-66.

Cook, D. and Weisberg, S. (1994). An introduction to Regression Graphics, Wiley, New York.

Denman, N. and Gregory, D. (1998). Analysis of sugar cane yields in the mulgrave area, for the 1997 sugar cane season, Technical report, MS305 Data Analysis Project, Department of Mathematics, University of Queensland.

Doksum, K., Tang, S. and Tsui, K. W. (2006). Nonparametric variable selection: The EARTH algorithm, Journal of the American Statistical Association, 103, 1609-1620. crossref(new window)

Friedman, J. H. (1991). Multivariate adaptive regression splines, Annals of Statistics, 19, 1-67. crossref(new window)

Kenkel, D. and Terza, J. (2001). The effect of physician advice on alcohol consumption: countregression with an endogenous treatment effect, Journal of applied econometrics, 16, 165-184. crossref(new window)

Liu, Z. and Stengos, T. (1999). Non-linearities in cross country growth regressions: A semiparametric approach, Journal of Applied Econometrics, 14, 527-538. crossref(new window)

Loh, W. Y. (2002). Regression trees with unbiased variable selection and interaction detection, Statistica Sinica, 12, 361-386.

Onoyama, K., Ohsumi, N., Mitsumochi, N. and Kishihara, T. (1998). Data analysis of deer-train collisions in eastern Hokkaido, Data Science, Classification, and Related Methods (ed. by Hayashi, C., Ohsumi, N., Yajima, K., Tanaka, Y., Bock, H.-H., Baba, Y.), 746-751, Japan. BMC Bioinformatics, 8:25

Svetnik, V., Liaw, A., Tong, C. and Culberson, J. C. (2003). Random forest: A classification and regression tool for compound classification and QSAR modeling, Journal of Chemical Information and Computer Sciences, 43, 1947-1958. crossref(new window)