Text filtering by Boosting Linear Perceptrons

  • O, Jang-Min (Artifical intellignece lab (SCAI) school of Computer Science & Engineering Seoul national University) ;
  • Zhang, Byoung-Tak (Artifical intellignece lab (SCAI) school of Computer Science & Engineering Seoul national University)
  • Published : 2000.08.01

Abstract

in information retrieval, lack of positive examples is a main cause of poor performance. In this case most learning algorithms may not characteristics in the data to low recall. To solve the problem of unbalanced data, we propose a boosting method that uses linear perceptrons as weak learnrs. The perceptrons are trained on local data sets. The proposed algorithm is applied to text filtering problem for which only a small portion of positive examples is available. In the experiment on category crude of the Reuters-21578 document set, the boosting method achieved the recall of 80.8%, which is 37.2% improvement over multilayer with comparable precision.

Keywords

References

  1. In Proc. of SIGIR Combining Classifiers in Text Categorization L. Larkey;W. Croft
  2. In Proc. of $6^{th}$ IJCAI A Brief Introduction to Boosting R.E. Schapire
  3. The Annals of Statistics v.26 no.5 Boosting the Margin : A New Explanation for the Effectiveness of Voting Methods R.E. Schapire;Y. Freund;P. Bartlett;W.S. Lee
  4. Neural Networks second edition S. Haykin
  5. In Proc. of the ECML Text Categorization with Support Vector Machines : Learning with Many Relevant Features T. Joachims
  6. Theory and Methods Learning from Data Concepts V. Cherkassky;F. Mulier
  7. Information Retrieval Data Structures & Algorithms W.B. Frakes;R. Baeze Yates
  8. Journal of Computer and System Science v.55 no.1 A decision-theoretic generalization of on-line learning and an application to boosting Y. Freund;R.E. Shapire
  9. Technical Report CMU-CS-97-127 An Evaluation of Statistical Approaches to Text Categorization Y. Yang
  10. In Proc. of ICML Feature Selection in Statistical Learning of Text Categorization Y. Yang;J. Pederson
  11. In Proc. of SIGIR-98 Boosting and Rocchio applied to text filtering R.E. Schapire;Y. Singer;A. Singhal
  12. The Nature of Statistical Learning Theory V. Vapnik