DOI QR코드

DOI QR Code

Variable Selection Based on Mutual Information

  • Published : 2009.01.31

Abstract

Best subset selection procedure based on mutual information (MI) between a set of explanatory variables and a dependent class variable is suggested. Derivation of multivariate MI is based on normal mixtures. Several types of normal mixtures are proposed. Also a best subset selection algorithm is proposed. Four real data sets are employed to demonstrate the efficiency of the proposals.

Keywords

References

  1. Battiti, R. (1994). Using mutual information for selecting features in supervised neural net learning, IEEE Transactions on Neural Networks, 5, 537-550 https://doi.org/10.1109/72.298224
  2. Brillinger, D. R. (2004). Some data analyses using mutual information, Brazilian Journal of Proba-bility and Statistics, 18, 163-183
  3. Christensen, R. (1997). Log-linear Models and Logistic Regression, Springer, New York
  4. Collett, D. (2003). Modelling Binary Data, 2nd ed., Chapman & Hall/CRC
  5. Cover, T. M. and Thomas, J. A. (1991). Element of Information Theory, John Wiley & Sons
  6. Darbellay, G. A. (1999). An estimator of the mutual information based on a criterion for indepen-dence, Computational Statistics & Data Analysis, 32, 1-17 https://doi.org/10.1016/S0167-9473(99)00020-1
  7. Fraley, C. and Raftery, A. E. (2002). MCLUST: Software for model-based clustering, density estima-tion and discriminant analysis, Technical report No. 415, Department of Statistics, University of Washington
  8. Huh, M. Y. (1995). Exploring multidimensional data with the flipped empirical distribution function, Journal of Computational and Graphical Statistics, 4, 335-343 https://doi.org/10.2307/1390860
  9. Huh, M. Y. and Song, K. Y. (2002). DAVIS: A Java-based data visualization system, Computational Statistics, 17, 411-423
  10. Hutter, M. (2002). Distribution of mutual information, In Advances in Neural Information Processing Systems 14, editor T. G. Dietterich and S. Becker and Z. Ghahramani, MIT Press, Cambridge, MA, 399-406
  11. Ihaka, R. and Gentleman, R. (1996). R: A language for data analysis and graphics, Journal of Com-putational and Graphical Statistics, 5, 299-314, http://www.r-project.org https://doi.org/10.2307/1390807
  12. Joe, H. (1989). Relative entropy measures of multivariate dependence, Journal of the American Statistical Association, 84, I57-I64 https://doi.org/10.2307/2289859
  13. Kojadinovic, I. (2005). Relevance measures for subset variable selection in regression problems based on k-additive mutual information, Computational Statistics & Data Analysis, 49, 1205-1227 https://doi.org/10.1016/j.csda.2004.07.026
  14. Kononenko, I., Simec, E. and Robnik-Sikonja, M. (1997). Overcoming the myopia of inductive learning algorithms with RELIEFF, Applied Intelligence, 7, 39-55 https://doi.org/10.1023/A:1008280620621
  15. Lee, S.-C. and Huh, M. Y. (2003). A measure of association for complex data, Computational Statistics & Data Analysis, 44, 211-222 https://doi.org/10.1016/S0167-9473(03)00031-8
  16. Liu, H. and Motoda, H. (1998). Feature Extraction, Construction and Selection: A Data Mining Perspective, 2nd Printing, Kluwer Academic Publishers
  17. Merz, C. J. and Murphy, P. M. (1996). UCI Repository of Machine Learning Databases, Department of Information and Computer Science, University of California, Irvine, $CA(http://www.ics.uci.edu/^{~} mlearn/MLRepository.html)$
  18. Miller, A. J. (1990). Subset Selection in Regression, Chapman & Hall/CRC, London
  19. Nguyen, H. S. and Skowron, A. (1995). Quantization of real value attributes. Proceedins of Second Joint Annual Conf. on Information Science, Wrightsville Beach, North Carolina, 34-37
  20. Shannon, C. E. (1948). A mathematical theory of communication, Bell System Technical Journal, 27, 379-423 and 623-656 https://doi.org/10.1002/j.1538-7305.1948.tb01338.x
  21. Torkkola, K. and Campbell, W. M. (2000). Mutual information in learning feature transformations, In Proceeding ICML'2000, The Seventeenth International Conference on Machine Learning
  22. Tourassi, G. D., Frederick, E. D., Markey, M. K. and Floyd, C. E. Jr. (2001). Application of the mutual information criterion for feature selection in computer-aided diagnosis, Medicine Physicist, 28, 2394-2402 https://doi.org/10.1118/1.1418724
  23. Wang, J. (2001). Generating daily changes in market variables using a multivariate mixture of normal distributions, Proceedings of the 33nd conference on Winter simulation, IEEE computer Society https://doi.org/10.1109/WSC.2001.977286
  24. Witten, I. and Frank, E. (1999). Data Mining, Morgan and Kaufmann. http://www.cs. waikato.ac.nz/ml/weka