JOURNAL BROWSE
Search
Advanced SearchSearch Tips
A Comparative Study on Discretization Algorithms for Data Mining
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
A Comparative Study on Discretization Algorithms for Data Mining
Choi, Byong-Su; Kim, Hyun-Ji; Cha, Woon-Ock;
  PDF(new window)
 Abstract
The discretization process that converts continuous attributes into discrete ones is a preprocessing step in data mining such as classification. Some classification algorithms can handle only discrete attributes. The purpose of discretization is to obtain discretized data without losing the information for the original data and to obtain a high predictive accuracy when discretized data are used in classification. Many discretization algorithms have been developed. This paper presents the results of our comparative study on recently proposed representative discretization algorithms from the view point of splitting versus merging and supervised versus unsupervised. We implemented R codes for discretization algorithms and made them available for public users.
 Keywords
Discretization;classification efficiency;R;
 Language
Korean
 Cited by
 References
1.
Acuna, E. (2005). Dprep: Data preprocessing and visualization functions for classification, R package version 1.0. http://paginas.fe.up.pt/˜ec/files 0506/R/dprep.pdf.

2.
Chmielewski, M. R. and Grzymala-Busse, J. W. (1996). Global discretization of continuous attributes as preprocessing for machine learning, International Journal of Approximate Reasoning, 15, 319-331. crossref(new window)

3.
Dougherty, J., Kohavi, R. and Sahami, M. (1995). Supervised and unsupervised discretization of continuous features, Machine learning, 194-202.

4.
Fayyad, U. M. and Irani, K. B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning, Artificial Intelligence, 13, 1022-1027.

5.
Gonzalez-Abril, L., Cuberos, F. J., Velasco, F. and Ortega, J. A. (2009). Ameva: An autonomous discretization algorithm, Expert Systems with Applications, 36, 5327–5332.

6.
Jin, H. and Charles, L. (2005). Using AUC and accuracy in evaluating learning algorithms, IEEE Transactions on Knowledge and Data Engineering, 17, 299-310. crossref(new window)

7.
Kerber, R. (1992). ChiMerge: Discretization of numeric attributes, In Proceedings of the Tenth National Conference on Artificial Intelligence, 123-128.

8.
Kim, H. J. (2010). Discretization: Data preprocessing, discretization for classification. R package version 1.0. http://lib.stat.cmu.edu/R/CRAN/web/packages/discretization/index.html.

9.
Kurgan, L. A. and Cios, K. J. (2004). CAIM discretization algorithm, IEEE Transactions on Knowledge and Data Engineering, 16, 145-153.

10.
Ling, C. X., Huang, J. and Zhang, H. (2003). AUC : A better measure than accuracy in comparing learning algorithm, Advances in Artificial Intelligence, 2671, 991.

11.
Liu, H. and Setiono, R. (1995). Chi2: Feature selection and discretization of numeric attributes, Tools with Artificial Intelligence, 388–391.

12.
Liu, H. and Setiono, R. (1997). Feature selection and discretization, IEEE Transactions on Knowledge and Data Engineering, 9, 642-645. crossref(new window)

13.
Liu, H., Hussain, H. F., Tan, C. L. and Dash, M. (2002). Discretization : An enabling technique, Data Mining and Knowledge Discovery, 6, 393-423. crossref(new window)

14.
Merz, C. J. and Murphy, P. M. (1998). UCI repository of machine learning database, department of information and computer science, University of California, Irvine, California, Available from: http://www.ics.uci.edu/ mlearn/MLRepository.html

15.
Pawlak, Z. (1982). Rough sets, International Journal of Computer and Information Sciences, 11, 341-356. crossref(new window)

16.
Quinlan, R. (1993). C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers Inc., San Francisco.

17.
R Development Core Team (2005). R: A language and environment for statistical computing, R Foundation for statistical computing, Vienna, Austria, ISBN 3-900051-07-0, URL http://www.R-project.org.

18.
Sotiris, K. and Dimitris, K. (2006). Discretization techiniques: A recent survey, GESTES International Transactions on Computer Science and Engineering, 32, 47-58.

19.
Su, C. T. and Hsu, J. H. (2005). An extended Chi2 algorithm for discretization of real value attributes, IEEE Transactions on Knowledge and Data Engineering, 17, 437–441.

20.
Tay, F. E. H. and Shen, L. (2002). Modified Chi2 algorithm for discretization, IEEE Transactions on Knowledge and Data Engineering, 14, 666-670. crossref(new window)

21.
Tsai, C. J., Lee, C. I. and Yang, W. P. (2008). A discretization algorithm based on class-attribute contingency coefficient, Information Sciences, 178, 714-731. crossref(new window)

22.
Witten, I. H. and Frank, E. (2000). Data Mining Practical Machine learning Tools and Techniques, Morgan kaufmann. Available from: http://www.cs.waikato.ac.nz/ml/weka/

23.
Zhaoa, Y. H. and Zhang, Y. (2008). Comparison of decision tree methods for finding active objects, Advances in Space Research, 41, 1955-1959. crossref(new window)

24.
Ziarko, W. (1993). Variable precision rough set model, Journal of Computer and System Sciences, 46, 39-59. crossref(new window)