A Three-Step Preprocessing Algorithm for Enhanced Classification of E-Mail Recommendation System

이메일 추천 시스템의 분류 향상을 위한 3단계 전처리 알고리즘

  • 조동섭 (이화여자대학교 공대 컴퓨터학과) ;
  • 정옥란 (이화여자대학교 공대 컴퓨터학과)
  • Published : 2005.04.01

Abstract

Automatic document classification may differ significantly according to the characteristics of documents that are subject to classification, as well as classifier's performance. This research identifies e-mail document's characteristics to apply a three-step preprocessing algorithm that can minimize e-mail document's atypical characteristics. In the first 5go, uncertain based sampling algorithm that used Mean Absolute Deviation(MAD), is used to address the question of selection learning document for the rule generation at the time of classification. In the subsequent stage, Weighted vlaue assigning method by attribute is applied to increase the discriminating capability of the terms that appear on the title on the e-mail document characteristic level. in the third and last stage, accuracy level during classification by each category is increased by using Naive Bayesian Presumptive Algorithm's Dynamic Threshold. And, we implemented an E-Mail Recommendtion System using a three-step preprocessing algorithm the enable users for direct and optimal classification with the recommendation of the applicable category when a mail arrives.

Keywords

References

  1. Ok-Ran Jeong, Dong-Sub Cho, 'A Personalized Recommendation Agent System for E-Mail Document Classification' , Computational Science and Its Applications-ICCSA 2004, LNCS3045, Springer Verlag, Vol 3, pp.558-565, 2004 https://doi.org/10.1007/b98053
  2. Ian H. written and Eibe Frank, 'Data Mining,' Morgan Kaufmann Publishers, Inc., 2000
  3. Pedro Domingos and Michael Pazzani. 'Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier,' In Proceedings of the 13thInternational Conference on Machine Learning, pp105-112, 1996
  4. F.Sebastiani, 'Machine Learning in Automated Text Categorization,' Technical Report IEI-B4-31-19
  5. David D. Lewis and William A.Gale. A Sequential Algorithm for Training Text Classifiers. In Proceedings of the 17thAnnual International ACM -SIGIR Conference on Research and Development in Information Retrieval, pp. 3-12, 1994
  6. David D. Lewis and Jason Catlett. Heterogeneous Uncertainty Sampling for Supervised Learning. In Proceedings of the 11th International Conference on Machine Learning, pages 148-156, 1994
  7. M. Trensh, N. Palmer, and A. Luniewski. Type Classification of Semi-structured Documents. In Proceedings of the 21st ACM SIGMOD International Conference on Management of Data, 1995
  8. 강영순, 이용배, 김태현, 조숙현, 맹성현, '전자우편문서의 효율적인 분류을 위한 전처리', 제 29회 춘계학술발표회, 한국정보과학회, 제29권 제1호 pp. 493-495, 2002
  9. 정옥란, 조동섭, '개인화된 분류를 위한 웹 메일 필터링 에이전트', 정보처리학회논문지B, 제 10-B권 제7호, pp.853-862, 2003 https://doi.org/10.3745/KIPSTB.2003.10B.7.853
  10. Tom Mitchell, MaGraw Hill, 'Machine Learning', McGRAW-HILL International Edition, 1997
  11. M. Trensh, N. Palmer, and A. Luniewski, 'Type Classication of Semi-structured Documents,' In Proceedings of the 21st ACM SIGMOD International Conference on Management of Data, 1995
  12. Yiming Yang, Jan O. Perdersen, 'A Comparative Study on Feature Selection in Text Cateforization', Proc. of ICML97, pp.412-420, 1997