JOURNAL BROWSE
Search
Advanced SearchSearch Tips
Boosting Algorithms for Large-Scale Data and Data Batch Stream
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
Boosting Algorithms for Large-Scale Data and Data Batch Stream
Yoon, Young-Joo;
  PDF(new window)
 Abstract
In this paper, we propose boosting algorithms when data are very large or coming in batches sequentially over time. In this situation, ordinary boosting algorithm may be inappropriate because it requires the availability of all of the training set at once. To apply to large scale data or data batch stream, we modify the AdaBoost and Arc-x4. These algorithms have good results for both large scale data and data batch stream with or without concept drift on simulated data and real data sets.
 Keywords
AdaBoost;Arc-x4;concept drift;data stream;ensemble method;large scale data;
 Language
Korean
 Cited by
1.
전진적 단계 알고리즘을 이용한 대용량 데이터와 순차적 배치 데이터의 분류,윤영주;

Journal of the Korean Data and Information Science Society, 2014. vol.25. 6, pp.1283-1291 crossref(new window)
1.
Classification of large-scale data and data batch stream with forward stagewise algorithm, Journal of the Korean Data and Information Science Society, 2014, 25, 6, 1283  crossref(new windwow)
 References
1.
Asuncion, A. and Newman, D. J. (2007). UCI Machine Learning Repository [http://www.ics.uci.edu/mlearn/MLRepository.html]. Irvine, CA: University of California, School of Information and Computer Science.

2.
Breiman, L. (1998). Arcing classifiers (with discussion), Annals of Statistics, 26, 801-849. crossref(new window)

3.
Breiman, L., Friedman, J., Olshen, R. and Stone, C. (1984), Classification and Regression Trees, Chapman & Hall, New York.

4.
Freund, Y. and Schapire, R. E. (1997). A decision-theoretic generalization of online learning and application to boosting, Journal of Computer and System Science, 55, 119-139. crossref(new window)

5.
Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning, Springer-Verlag, New York.

6.
Kohavi, R. (1996). Scaling up the accuracy of naive-bayes classifiers: A decision-tree hybrid, Proceedings of the second International Conference on Knowledge Discovery and Data Mining, 202-207.

7.
Kuncheva, L. I. (2004). Classification ensemble for changing environments, Proceedings of 5th International Workshop on Multiple Classifier Systems, 1-15.

8.
Quinlan, J. R. (1993). C4.5: Prigrams for Machine Learning, Morgan Kaufmann, San Maeto, CA.

9.
Rudin, C., Daubechies, I. and Schapire, R. E. (2004). The dynamics of AdaBoost: cyclic behavior and convergence of margins, Journal of Machine Learning Research, 5, 1557-1595.

10.
Street, W. N. and Kim, Y. S. (2001). A streaming ensemble algorithm (SEA) for large scale classification, Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 377-382.

11.
Wang, H., Fan, W., Yu, P. S. and Han, J. (2003). Mining concept drifting data streams using ensemble classifiers, Proceedings of then 9th ACM SIGKDD International Conference on Knowledge discovery and Data Mining, 226-235.

12.
Yeon, K., Choi, H., Yoon, Y. J. and Song, M. S. (2005). Model based ensemble learning for tracking concept drift, Proceedings of 55th Session of the International Statistical Institute.