DOI QR코드

DOI QR Code

Evaluation of Attribute Selection Methods and Prior Discretization in Supervised Learning

  • Cha, Woon Ock (Division of Computer Engineering, Hansung University) ;
  • Huh, Moon Yul (Department of Statistics, SungkyunK$\$kwan University)
  • Published : 2003.12.01

Abstract

We evaluated the efficiencies of applying attribute selection methods and prior discretization to supervised learning, modelled by C4.5 and Naive Bayes. Three databases were obtained from UCI data archive, which consisted of continuous attributes except for one decision attribute. Four methods were used for attribute selection : MDI, ReliefF, Gain Ratio and Consistency-based method. MDI and ReliefF can be used for both continuous and discrete attributes, but the other two methods can be used only for discrete attributes. Discretization was performed using the Fayyad and Irani method. To investigate the effect of noise included in the database, noises were introduced into the data sets up to the extents of 10 or 20%, and then the data, including those either containing the noises or not, were processed through the steps of attribute selection, discretization and classification. The results of this study indicate that classification of the data based on selected attributes yields higher accuracy than in the case of classifying the full data set, and prior discretization does not lower the accuracy.

Keywords

References

  1. Classification and regression trees Breiman,L.;Friedman,J.H.;Olshen,R.A.;Stone,C.J.
  2. Intelligent Data Analysis Feature selection for classification Dash,M.;Liu,H.
  3. Pattern Reognition: A Statistical Approach Devijver,P.A.;Kittler,J.
  4. Machine Learning v.8 On the Handling of Continuous-valued Attributes in Decision Tree Generation Fayyad,U.M.;Irani,K.B.
  5. Benchmarking Attribute Selection Techniques for Data Mining Hall,M.A.;Holmes,G.
  6. Journal of Computational and Graphical statistics v.5 no.3 A language for data analysis and graphics Ihaka,R.;Gentleman,R. https://doi.org/10.2307/1390807
  7. Proceed. of Nat'l Conf. of AI The feature selection problem : Traditional methods and a new algorithm Kira,K.;Rendell,L.A.
  8. Proceed. of European Conference on Machine Learning Estimating attributes : Analysis and extension of RELIEF Kononenko,I.
  9. Computational Statistics and Data Analysis v.44 no.Issue 1-2 A Measure of Association for Complex Data Lee,S.C.;Huh,M.Y.
  10. Proceedings of the 13th International Conference on Machine Learning A Probabilistic Approach to Feature Selection: A Filter Solution Liu,H.;Setino,R.
  11. Feature selection for Knowledge Discovery and Data Mining Liu,H.;Motoda,H.
  12. UCI Repository of Machine Learning Databases Merz,C.J.;Murphuy,P.M.
  13. Machine Learning v.1 Induction of decision trees Quinlan,J.R.
  14. C4.5: Programs for machine learning Quinlan,J.R.
  15. Data Mining Witten,I.;Frank,E.