Discretization Method Based on Quantiles for Variable Selection Using Mutual Information



CHa, Woon-Ock;Huh, Moon-Yul

  • 발행 : 2005.12.01


This paper evaluates discretization of continuous variables to select relevant variables for supervised learning using mutual information. Three discretization methods, MDL, Histogram and 4-Intervals are considered. The process of discretization and variable subset selection is evaluated according to the classification accuracies with the 6 real data sets of UCI databases. Results show that 4-Interval discretization method based on quantiles, is robust and efficient for variable selection process. We also visually evaluate the appropriateness of the selected subset of variables.


variable selection;discretization;mutual information;data visualization


  1. Bonnlander, B. V. and Weigend, A S.(1994). Selecting Input Variables Using mutual Information and Nonparametric Density Estimation, Proceedings of the International Symposium on Artificial Neural Networks(ISANN), Taiwan, 42-50
  2. Breiman, L., Friedman, J. H., Olshen, R. A and Stone, C. J.(1984). Classification and regression trees, Wardsworth, Belmont, CA
  3. Cha, W. and Huh, M.(2003). Evaluation of Attribute Selection Methods and Prior Discretization in Supervised Learning, 한국통계학회 논문집, Vol. 10, No.3, 879-894 https://doi.org/10.5351/CKSS.2003.10.3.879
  4. Cover, T. M. and Thomas, J. A(1991). Elements of Information Theory, Wiley, New York
  5. Dash, M. and Liu, H.(1997). Feature selection for classification, Intelligent Data Analysis, Elsevier Science Inc
  6. Devijver, P. A and Kittler, J.(1982). Pattern Recognition : A Statistical Approach, Prentice Hall International
  7. Fayyad, U. M. and Irani, K. B.(1992). On the Handling of Continuous-valued Attributes in Decision Tree Generation, Machine Learning, Vol. 8, 87-192
  8. Huh, M. Y.(2005). DAVIS(http://stat.skku.ac.kr/myhuh/DAVIS.html)
  9. Ihaka, R and Gentleman, R(1996). R:A language for data analysis and graphics, Journal of Computational and Graphical Statistics, 5(3), 299-314. (http://www.r-project.org) https://doi.org/10.2307/1390807
  10. Liu, H. and Motoda, H.(1998). Feature Selection for Knowledge Discovery and Data Mining, Kluwer Academic Publishers
  11. Merz, C. J. and Murphy, P. M.(1996). UCI Repository of Machine Learning Databases, Department of Information and Computer Science, University of California, Irvine, CA (http.//www.ics.uci.edu/~mlearn/MLRepository.html)
  12. Parzen, E.(1962). On the estimation of probability density function and mode, Annals of Mathematical Statistics, 33(3), 1065-1076 https://doi.org/10.1214/aoms/1177704472
  13. Venables, W. N. and Ripley, B. D.(1994). Modern Applied Statistics with S-Plus, Springer, New York
  14. Witten, I. and Frank, E.(1999). Data Mining, Morgan and Kaufmann. (http://www.cs.waikato.ac.nz/ml/weka)
  15. Battiti, R.(1994). Using mutual information for selecting features in supervised neural net learning, IEEE Transactions on Neural Networks, 5, 537-550 https://doi.org/10.1109/72.298224
  16. Tourassi, G. D., Frederick, E. D., Markey, M. K. and Floyed, C. E., Jr.(2001), Application of the mutual information criterion for feature selection in computer-aided diagnosis, Medical Physics, 28(12), 2394-2402 https://doi.org/10.1118/1.1418724