Advanced SearchSearch Tips
Variable Selection Based on Mutual Information
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
Variable Selection Based on Mutual Information
Huh, Moon-Y.; Choi, Byong-Su;
  PDF(new window)
Best subset selection procedure based on mutual information (MI) between a set of explanatory variables and a dependent class variable is suggested. Derivation of multivariate MI is based on normal mixtures. Several types of normal mixtures are proposed. Also a best subset selection algorithm is proposed. Four real data sets are employed to demonstrate the efficiency of the proposals.
Best subset selection;feature selection;mutual information;normal mixture;
 Cited by
Battiti, R. (1994). Using mutual information for selecting features in supervised neural net learning, IEEE Transactions on Neural Networks, 5, 537-550 crossref(new window)

Brillinger, D. R. (2004). Some data analyses using mutual information, Brazilian Journal of Proba-bility and Statistics, 18, 163-183

Christensen, R. (1997). Log-linear Models and Logistic Regression, Springer, New York

Collett, D. (2003). Modelling Binary Data, 2nd ed., Chapman & Hall/CRC

Cover, T. M. and Thomas, J. A. (1991). Element of Information Theory, John Wiley & Sons

Darbellay, G. A. (1999). An estimator of the mutual information based on a criterion for indepen-dence, Computational Statistics & Data Analysis, 32, 1-17 crossref(new window)

Fraley, C. and Raftery, A. E. (2002). MCLUST: Software for model-based clustering, density estima-tion and discriminant analysis, Technical report No. 415, Department of Statistics, University of Washington

Huh, M. Y. (1995). Exploring multidimensional data with the flipped empirical distribution function, Journal of Computational and Graphical Statistics, 4, 335-343 crossref(new window)

Huh, M. Y. and Song, K. Y. (2002). DAVIS: A Java-based data visualization system, Computational Statistics, 17, 411-423

Hutter, M. (2002). Distribution of mutual information, In Advances in Neural Information Processing Systems 14, editor T. G. Dietterich and S. Becker and Z. Ghahramani, MIT Press, Cambridge, MA, 399-406

Ihaka, R. and Gentleman, R. (1996). R: A language for data analysis and graphics, Journal of Com-putational and Graphical Statistics, 5, 299-314, crossref(new window)

Joe, H. (1989). Relative entropy measures of multivariate dependence, Journal of the American Statistical Association, 84, I57-I64 crossref(new window)

Kojadinovic, I. (2005). Relevance measures for subset variable selection in regression problems based on k-additive mutual information, Computational Statistics & Data Analysis, 49, 1205-1227 crossref(new window)

Kononenko, I., Simec, E. and Robnik-Sikonja, M. (1997). Overcoming the myopia of inductive learning algorithms with RELIEFF, Applied Intelligence, 7, 39-55 crossref(new window)

Lee, S.-C. and Huh, M. Y. (2003). A measure of association for complex data, Computational Statistics & Data Analysis, 44, 211-222 crossref(new window)

Liu, H. and Motoda, H. (1998). Feature Extraction, Construction and Selection: A Data Mining Perspective, 2nd Printing, Kluwer Academic Publishers

Merz, C. J. and Murphy, P. M. (1996). UCI Repository of Machine Learning Databases, Department of Information and Computer Science, University of California, Irvine, $CA(^{~} mlearn/MLRepository.html)$

Miller, A. J. (1990). Subset Selection in Regression, Chapman & Hall/CRC, London

Nguyen, H. S. and Skowron, A. (1995). Quantization of real value attributes. Proceedins of Second Joint Annual Conf. on Information Science, Wrightsville Beach, North Carolina, 34-37

Shannon, C. E. (1948). A mathematical theory of communication, Bell System Technical Journal, 27, 379-423 and 623-656 crossref(new window)

Torkkola, K. and Campbell, W. M. (2000). Mutual information in learning feature transformations, In Proceeding ICML'2000, The Seventeenth International Conference on Machine Learning

Tourassi, G. D., Frederick, E. D., Markey, M. K. and Floyd, C. E. Jr. (2001). Application of the mutual information criterion for feature selection in computer-aided diagnosis, Medicine Physicist, 28, 2394-2402 crossref(new window)

Wang, J. (2001). Generating daily changes in market variables using a multivariate mixture of normal distributions, Proceedings of the 33nd conference on Winter simulation, IEEE computer Society crossref(new window)

Witten, I. and Frank, E. (1999). Data Mining, Morgan and Kaufmann. http://www.cs.