JOURNAL BROWSE
Search
Advanced SearchSearch Tips
A simple diagnostic statistic for determining the size of random forest
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
A simple diagnostic statistic for determining the size of random forest
Park, Cheolyong;
  PDF(new window)
 Abstract
In this study, a simple diagnostic statistic for determining the size of random forest is proposed. This method is based on MV (margin of victory), a scaled difference in the votes at the infinite forest between the first and second most popular categories of the current random forest. We can note that if MV is negative then there is discrepancy between the current and infinite forests. More precisely, our method is based on the proportion of cases that -MV is greater than a fixed small positive number (say, 0.03). We derive an appropriate diagnostic statistic for our method and then calculate the distribution of the statistic. A simulation study is performed to compare our method with a recently proposed diagnostic statistic.
 Keywords
Diagnostic statistic;margin of victory;random forest;size determination;
 Language
Korean
 Cited by
 References
1.
Banfield, R. E., Hall, L. O., Bowyer, K. W. and Kegelmeyer, W. P. (2007). A comparison of decision tree creation techniques. IEEE Transactions on Pattern Recognition and Machine Learning, 29, 173-180. crossref(new window)

2.
Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123-140.

3.
Breiman, L. (2001). Random forest. Machine Learning, 45, 5-32. crossref(new window)

4.
Choi, S. H. and Kim, H. (2016). Tree size determination for classification ensemble. Journal of the Korean Data & Information Science Society, 27, 255-264. crossref(new window)

5.
Dudoit, S., Fridlyand, J. and Speed, T. P. (2002). Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Society, 97, 77-87. crossref(new window)

6.
Hamza, M. and Larocque, D. (2005). An empirical comparison of ensemble methods based on classification trees. Journal of Statistical Computation and Simulation, 75, 629-643. crossref(new window)

7.
Hernandez-Lobato, D., Martinez-Munoz, G. and Suarez, A. (2011). Inference on prediction of ensembles of infinite size. Pattern Recognition, 44, 1426-1434. crossref(new window)

8.
Hernandez-Lobato, D., Martinez-Munoz, G. and Suarez, A. (2013). How large should ensembles of classifiers be? Pattern Recognition, 46, 1323-1336. crossref(new window)

9.
Park, C. (2010). Simple hypotheses testing for the number of trees in a random forest. Journal of the Korean Data & Information Science Society, 21, 371-377.

10.
Shapire, R., Freund, Y., Bartlett, P. and Lee, W. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Annals of Statistics, 26, 1651-1686. crossref(new window)