DOI QR코드

DOI QR Code

Automatic Document Classification Using Multiple Classifier Systems

다중 분류기 시스템을 이용한 자동 문서 분류

  • Published : 2004.08.01

Abstract

Combining multiple classifiers to obtain improved performance over the individual classifier has been a widely used technique. The task of constructing a multiple classifier system(MCS) contains two different Issues how to generate a diverse set of base-level classifiers and how to combine their predictions. In this paper, we review the characteristics of existing multiple classifier systems : Bagging, Boosting, and Slaking. For document classification, we propose new MCSs such as Stacked Bagging, Stacked Boosting, Bagged Stacking, Boosted Stacking. These MCSs are a sort of hybrid MCSs that combine advantages of existing MCSs such as Bugging, Boosting, and Stacking. We conducted some experiments of document classification to evaluate the performances of the proposed schemes on MEDLINE, Usenet news, and Web document collections. The result of experiments demonstrate the superiority of our hybrid MCSs over the existing ones.

단일 분류기에 비해 높은 분류성능을 얻기 위해 다수의 분류기들을 결합하여 사용하는 방법은 폭넓게 이용되어 온 기술이다. 하나의 다중 분류기 시스템을 구성하는 일은 다음 두 가지 문제들을 가지고 있다. 첫째는 어떻게 기반 분류기들을 생성하느냐 하는 것이고 둘째는 이들의 예측결과를 어떻게 결합하느냐 하는 것이다. 본 논문에서는 Bagging, Boosting, Stacking 등 기존의 대표적인 다중 분류기 시스템들의 특징을 살펴보고, 문서 분류를 위한 새로운 다중 분류기 시스템들인 Stacked Bagging, Stacked Boosting, Bagged Stacking, Boosted Stacking들을 제안한다. 이들은 Bagging, Boosting, Stacking과 같은 기존 다중 분류기 시스템들의 장점들을 결합한 일종의 혼합형 다중 분류기 시스템들이다. 본 논문에서는 제안된 다중 분류기 시스템들의 성능을 평가하기 위해 MEDLINE, 유즈넷 뉴스, 웹 문서 등의 문서집합을 이용한 문서 분류 실험들을 전개하였다. 그리고 이러한 실험결과를 통해 제안한 혼합형 다중 분류기 시스템들은 전반적으로 기존 시스템들보다 우수한 성능을 보이는 것으로 나타났다.

Keywords

References

  1. Sahami, Mehran. 'Using Machine Learning to Improve Information Access,' a Dissertation, Stanford : Dept. of Computer Science, Stanford University, 1998
  2. Mladeni'c, D., Grobelnik, M., 'Efficient Text Categorization,' In Text Mining workshop on the 10th european Conference on Machine Learning ECML98, 1998
  3. Chen, H., 'Machine Learning for Information Retrieval : Neural Networks, Symbolic Learning and Genetic Algorithms,' JASIS, Vol.46, pp.194-216, 1995 https://doi.org/10.1002/(SICI)1097-4571(199504)46:3<194::AID-ASI4>3.0.CO;2-S
  4. Hong, Se June and Sholom M. Weiss, 'Advances in Predictive Model Generation for Data Mining,' IBM Research Report RC-21570, 1999
  5. Breiman, Leo, 'Bagging Predictors', Machine Learning, Vol.24, pp.49-6, 1996 https://doi.org/10.1023/A:1018054314350
  6. Schaphire, Robert E., 'Theoretical Views of Boosting,' In Computational Learning Theory : 4th European Conference, EuroCOLT '99, 1999
  7. Wolpert, David H., 'Stacked Generalization', Neural Networks, Vol.5, pp.241-259, 1992 https://doi.org/10.1016/S0893-6080(05)80023-1
  8. Bauer, Eric and Ron Kohavi, 'An Empirical Comparison of Voting Classification Algorithms : Bagging, Boosting, and Variants,' Machine Learning, 36, pp.105-142, 1999 https://doi.org/10.1023/A:1007515423169
  9. Salton, Gerard, 'Introduction to Information Retrieval,' New York : McGraw-Hill, 1983
  10. Han, Jiawei and Micheline Kamber, 'Data Mining : Concepts and Techniques,' New York : Morgan Kaufmann, 2001
  11. Shankar, Shrikanth and Karypis, George, 'A Feature Weight Adjustment Algorithm for Document Categorization,' In SIGKDD'00 Workshop on Text Mining, Boston, MA, 2000
  12. Mitchell, Tom, 'Machine Learning,' New York : McGraw-Hill, 1997
  13. Lewis, D. D. and M. Ringuette, 'A Comparison of Two Learning Algorithms in Text Categorization,' in Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, pp.81-93, 1994
  14. Domingos, P. and M. Pazzani, 'On the Optimality of the Simple Bayesian Classifier under Zero One Loss,' Machine Learning, Vol.29, pp.103-130, 1997 https://doi.org/10.1023/A:1007413511361
  15. Witten, Ian H. and Eibe Frank, 'Data Mining : Practical Machine Learning Tools and Techniques with Java Implementations,' New York : Morgen Kaufman, 2000
  16. Wolpert, D. and W. Macready, 'Combining Stacking with Bagging to Improve a Learning Algorithm,' Technical report. Santa Fe: Santa Fe Institute, 1996
  17. Yang, Yiming and Jan O. Pedersen, 'A Comparative Study on Feature Selection in Text Categorization,' In Proceedings of the Fourteenth International Conference on Machine Learning, Vol.43, 1997