JOURNAL BROWSE
Search
Advanced SearchSearch Tips
Tree size determination for classification ensemble
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
Tree size determination for classification ensemble
Choi, Sung Hoon; Kim, Hyunjoong;
  PDF(new window)
 Abstract
Classification is a predictive modeling for a categorical target variable. Various classification ensemble methods, which predict with better accuracy by combining multiple classifiers, became a powerful machine learning and data mining paradigm. Well-known methodologies of classification ensemble are boosting, bagging and random forest. In this article, we assume that decision trees are used as classifiers in the ensemble. Further, we hypothesized that tree size affects classification accuracy. To study how the tree size in uences accuracy, we performed experiments using twenty-eight data sets. Then we compare the performances of ensemble algorithms; bagging, double-bagging, boosting and random forest, with different tree sizes in the experiment.
 Keywords
Bagging;boosting;classification;decision tree;double-bagging;ensemble;random forest;
 Language
English
 Cited by
1.
A simple diagnostic statistic for determining the size of random forest, Journal of the Korean Data and Information Science Society, 2016, 27, 4, 855  crossref(new windwow)
 References
1.
Asuncion, A. and Newman, D. J. (2007). UCI machine learning repository. University of California, Irvine, School of Information and Computer Science, http://archive.ics.uci.edu/ml.

2.
Bauer, E. and Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting and variants. Machine Learning, 36, 105-139. crossref(new window)

3.
Breiman, L. (1996a). Bagging predictors. Machine Learning, 26, 123-140.

4.
Breiman, L. (1996b). Out-of-bag estimation, Technical Report, Statistics Department, University of California Berkeley, Berkeley, California 94708, https://www.stat.berkeley.edu/-breiman/OOBestimation.pdf.

5.
Breiman, L. (2001). Random forests. Machine Learning, 45, 5-32. crossref(new window)

6.
Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and regression trees, Chapman and Hall, New York.

7.
Dietterich, T. (2000). Ensemble methods in machine learning, Springer, Berlin.

8.
Freund, Y. and Schapire, R. (1996). Game theory, on-line prediction and boosting. Proceedings of the Ninth Annual Conference on Computational Learning Theory, 325-332.

9.
Hansen, L. K., Salamon, P. (1990). Neural network ensembles. IEEE Transactions on Pattern Analysis and machine Intelligence, 12, 993-1001. crossref(new window)

10.
Heinz, G., Peterson, L. J., Johnson, R. W. and Kerk, C. J. (2003). Exploring relationships in body dimensions. Journal of Statistics Education, 11, http://www.amstat.org/publications/jse/v11n2/datasets.heinz.html.

11.
Hothorn, T. and Lausen, B. (2003). Double-bagging: Combining classifiers by bootstrap aggregation. Pattern Recognition, 36, 1303-1309. crossref(new window)

12.
Kim, A., Kim, J. and Kim, H. (2012). The guideline for choosing the right-size of tree for boosting algorithm. Journal of the Korean Data and Information Science Society, 23, 949-959. crossref(new window)

13.
Kim, H. and Loh, W. Y. (2001). Classification trees with unbiased multiway splits. Journal of the American Statistical Association, 96, 589-604. crossref(new window)

14.
Kim, H. and Loh, W. Y. (2003). Classification trees with bivariate linear discriminant node models. Journal of Computational and Graphical Statistics, 12, 512-530. crossref(new window)

15.
Kwak, S. and Kim, H. (2014). Comparison of ensemble pruning methods using Lasso-bagging and WAVE-bagging. Journal of the Korean Data and Information Science Society, 25, 1371-1383. crossref(new window)

16.
Liew, A. and Wiener, M. (2002). Classification and regression by random forests. R News, 2, 18-22.

17.
Loh, W. Y. (2009). Improving the precision of classification trees. The Annals of Applied Statistics, 3, 1710-1737. crossref(new window)

18.
Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5, 197-227.

19.
Schapire, R. E. and Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37, 297-336. crossref(new window)

20.
Shim, J. and Hwang, C. H. (2014). Support vector quantile regression ensemble with bagging. Journal of the Korean Data and Information Science Society, 25, 677-684. crossref(new window)

21.
Statlib. (2010). Datasets archive. Carnegie Mellon University, Department of Statistics, http://lib.stat.cmu.edu.

22.
Terhune, J. M. (1994). Geographical variation of harp seal underwater vocalizations. Canadian Journal of Zoology, 72, 892-897. crossref(new window)

23.
Therneau, T. and Atkinson, E. (1997). An introduction to recursive partitioning using the RPART routines, Mayo Foundation, Rochester, New York. http://eric.univ-lyon2.fr/-ricco/cours/didacticiels/r/longdocrpart.pdf.

24.
Zhu, J., Zou, H., Rosset, S. and Hastie, T. (2009). Multi-class AdaBoost. Statistics and its Interface, 2, 349-360. crossref(new window)