Discretization Method Based on Quantiles for Variable Selection Using Mutual Information



CHa, Woon-Ock;Huh, Moon-Yul

  • 발행 : 2005.12.01


This paper evaluates discretization of continuous variables to select relevant variables for supervised learning using mutual information. Three discretization methods, MDL, Histogram and 4-Intervals are considered. The process of discretization and variable subset selection is evaluated according to the classification accuracies with the 6 real data sets of UCI databases. Results show that 4-Interval discretization method based on quantiles, is robust and efficient for variable selection process. We also visually evaluate the appropriateness of the selected subset of variables.


variable selection;discretization;mutual information;data visualization


