Comparison of Multiway Discretization Algorithms for Data Mining

  • Kim, Jeong-Suk (Information & Communication Dept., HIRA,) ;
  • Jang, Young-Mi (Dept. of Information and Statistics, Chungbuk National University) ;
  • Na, Jong-Hwa (Dept. of Information and Statistics & Institute for Basic Science Research, Chungbuk National University)
  • Published : 2005.11.30

Abstract

The discretization algorithms for continuous data have been actively studied in the area of data mining. These discretizations are very important in data analysis, especially for efficient model selection in data mining. So, in this paper, we introduce the principles of some mutiway discretization algorithms including KEX, 1R and CN4 algorithm and investigate the efficiency of these algorithms through numerical study. For various underlying distribution, we compare these algorithms in view of misclassification rate.

Keywords

References

  1. Knowledge EXplorer : A tool for automated knowledge acquisition from data, Technical Report TR-93-03 Berka, P.
  2. Discretization of numerical attributes for Knowledge EXplorer, Technical Report LISP-93-03 Berka, P.
  3. Discretization and grouping: preprocessing steps for data mining, Principles of Data Mining and Knowledge Discovery Berka, P.;Bruha, I.
  4. Empirical comparisons of various discretization procedures, Technical Report LISP-95-04 Berka, P.;Bruha, I.
  5. Continuous classes in rule induction: Empirical comparison of two approaches, Manuscript Bruha, I.;Berka, P.
  6. Machine Learning v.11 Very simple classification rules perform well on most commonly used datatsets Holte, R.C.
  7. Proceedings of the Tenth National Conference on Artificial Intelligence ChiMerger: Discretization of numeric attributes Kerber, R.
  8. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining Error-based and entropy-based discretization of continuous features Kohavi, R.;Sahami, M.
  9. Discretizing numerical attributes in a genetic attribute-based learning algorithm, Manuscript Kralik, P.;Bruha, I.
  10. Comparison of binary discretization algorithms for data mining Na, J.H.;Jang, Y.M.
  11. The development of Holte's 1R classifier, Manuscript Nevill-Manning, C.;Holmes, G.;Ian, H.
  12. Minimum splits based discretization for continuous features, Manuscript Wang, K.;Goh, H.C.