DOI QR코드

DOI QR Code

Conditional Mutual Information-Based Feature Selection Analyzing for Synergy and Redundancy

  • Cheng, Hongrong (Department of Computer Science, University of Electronic Science and Technology) ;
  • Qin, Zhiguang (Department of Computer Science, University of Electronic Science and Technology) ;
  • Feng, Chaosheng (Department of Computer Science, University of Electronic Science and Technology) ;
  • Wang, Yong (Department of Computer Science, University of Electronic Science and Technology) ;
  • Li, Fagen (Department of Computer Science, University of Electronic Science and Technology)
  • 투고 : 2010.04.20
  • 심사 : 2010.06.28
  • 발행 : 2011.04.30

초록

Battiti's mutual information feature selector (MIFS) and its variant algorithms are used for many classification applications. Since they ignore feature synergy, MIFS and its variants may cause a big bias when features are combined to cooperate together. Besides, MIFS and its variants estimate feature redundancy regardless of the corresponding classification task. In this paper, we propose an automated greedy feature selection algorithm called conditional mutual information-based feature selection (CMIFS). Based on the link between interaction information and conditional mutual information, CMIFS takes account of both redundancy and synergy interactions of features and identifies discriminative features. In addition, CMIFS combines feature redundancy evaluation with classification tasks. It can decrease the probability of mistaking important features as redundant features in searching process. The experimental results show that CMIFS can achieve higher best-classification-accuracy than MIFS and its variants, with the same or less (nearly 50%) number of features.

키워드

참고문헌

  1. D. Koller and M. Sahami, "Toward Optimal Feature Selection," Proc. 13th Int. Conf. Machine Learning, 1996, pp. 284-292.
  2. M. Dash and H. Liu, "Feature Selection for Classification," Intelligent Data Analysis, vol. 1, 1997, pp. 131-156. https://doi.org/10.1016/S1088-467X(97)00008-5
  3. E. Amaldi and V. Kann, "On the Approximation of Minimizing Nonzero Variables or Unsatisfied Relations in Linear Systems," Theoretical Computer Sci., vol. 209, 1998, pp. 237-260. https://doi.org/10.1016/S0304-3975(97)00115-1
  4. R. Kohavi and G.H. John, "Wrappers for Feature Subset Selection," Artificial Intell., vol. 97, no. 1-2, 1997, pp. 273-324. https://doi.org/10.1016/S0004-3702(97)00043-X
  5. R. Battiti, "Using Mutual Information for Selecting Features in Supervised Neural Net Learning," IEEE Trans. Neural Netw., vol. 5, no. 4, 1994, pp. 537-550. https://doi.org/10.1109/72.298224
  6. N. Kwak and C.H. Choi, "Input Feature Selection for Classification Problems," IEEE Trans. Neural Netw., vol. 13, no. 1, 2002, pp. 143-159. https://doi.org/10.1109/72.977291
  7. J.J. HUANG et al., "Feature Selection for Classificatory Analysis Based on Information-Theoretic Criteria," Acta Automatica Sinica, vol. 34, no. 3, 2008, pp. 383-392. https://doi.org/10.3724/SP.J.1004.2008.00383
  8. H. Peng, F. Long, and C. Ding, "Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy," IEEE Trans. Pattern Anal. Machine Intell., vol. 27, no. 8, 2005, pp. 1226-1238. https://doi.org/10.1109/TPAMI.2005.159
  9. P.A. Estevez et al., "Normalized Mutual Information Feature Selection," IEEE Trans. Neural Netw., vol. 20, no. 2, 2009, pp. 189-201. https://doi.org/10.1109/TNN.2008.2005601
  10. J. Novovicova, "Conditional Mutual Information Based Feature Selection for Classification Task," Progress Pattern Recog., Image Anal. Appl., LNCS, Springer, vol. 4756, 2007, pp. 417-426.
  11. R.M. Fano, Transmission of Information: A Statistical Theory of Communications, New York, USA: Wiley Press, 1961.
  12. C.E. Shannon and W. Weaver, The Mathematical Theory of Communication, Urbana, Israel: University of Illinois Press, 1949.
  13. T.M. Cover and J.A. Thomas, Elements of Information Theory, New York, USA:Wiley-Interscience Press, 1991.
  14. U.M. Fayyad and K.B. Irani, "Multi-interval Discretization of Continuous-Valued Attributes for Classification Learning," Proc. 13th Int. Joint Conf. Artificial Intell., 1993, pp. 1022-1027.
  15. W.J. McGill, "Multivariate Information Transmission," Psychomeetrika, vol. 19, no. 2, 1954, pp. 97-116. https://doi.org/10.1007/BF02289159
  16. A. Jakulin and I. Bratko, "Quantifying and Visualizing Attribute Interactions: An Approach Based on Entropy." Available: http://arxiv.org/abs/cs.AI/0308002v3, 2004.
  17. C.J. Merz and P.M. Murphy, "UCI Repository of Machine Learning Databases [Online]." Available: http://www.ics.uci.edu/fmlearn/MLRepository.html.
  18. H. Peng, "mRMR Sample Data Sets [Online]." Available: http://penglab.janelia.org/proj/mRMR/test colon s3.csv.
  19. I.H. Witten and E. Frank, Data Mining-Practical Machine Learning Tools and Techniques with JAVA Implementations, Morgan Kaufmann Publishers, 2nd ed., 2005.

피인용 문헌

  1. Design of Prototype-Based Emotion Recognizer Using Physiological Signals vol.35, pp.5, 2011, https://doi.org/10.4218/etrij.13.0112.0751
  2. Benefiting feature selection by the discovery of false irrelevant attributes vol.13, pp.4, 2011, https://doi.org/10.1142/s021969131550023x
  3. Automatic Recognition of Atypical Lymphoid Cells From Peripheral Blood by Digital Image Analysis vol.143, pp.2, 2011, https://doi.org/10.1309/ajcp78ifstogzzjn
  4. Deep sparse feature selection for computer aided endoscopy diagnosis vol.48, pp.3, 2011, https://doi.org/10.1016/j.patcog.2014.09.010
  5. Effective feature selection using feature vector graph for classification vol.151, pp.1, 2011, https://doi.org/10.1016/j.neucom.2014.09.027
  6. Supervised feature selection method via potential value estimation vol.19, pp.4, 2016, https://doi.org/10.1007/s10586-016-0635-0
  7. Big data analytics for forecasting cycle time in semiconductor wafer fabrication system vol.54, pp.23, 2011, https://doi.org/10.1080/00207543.2016.1174789
  8. K- local maximum margin feature extraction algorithm for churn prediction in telecom vol.20, pp.2, 2011, https://doi.org/10.1007/s10586-017-0843-2
  9. Feature Selection by Maximizing Independent Classification Information vol.29, pp.4, 2011, https://doi.org/10.1109/tkde.2017.2650906
  10. Using Feature-Based Models with Complexity Penalization for Selecting Features vol.90, pp.2, 2011, https://doi.org/10.1007/s11265-016-1152-3
  11. Identifying Health Status of Wind Turbines by Using Self Organizing Maps and Interpretation-Oriented Post-Processing Tools vol.11, pp.4, 2018, https://doi.org/10.3390/en11040723
  12. Feature Selection Algorithms in Intrusion Detection System: A Survey vol.12, pp.10, 2011, https://doi.org/10.3837/tiis.2018.10.024
  13. Feature selection considering weighted relevancy vol.48, pp.12, 2011, https://doi.org/10.1007/s10489-018-1239-6
  14. Feature Selection Algorithms for Wind Turbine Failure Prediction vol.12, pp.3, 2011, https://doi.org/10.3390/en12030453
  15. Feature Selection with Conditional Mutual Information Considering Feature Interaction vol.11, pp.7, 2011, https://doi.org/10.3390/sym11070858
  16. Feature selection for intrusion detection using new multi-objective estimation of distribution algorithms vol.49, pp.12, 2011, https://doi.org/10.1007/s10489-019-01503-7
  17. On some aspects of minimum redundancy maximum relevance feature selection vol.63, pp.1, 2011, https://doi.org/10.1007/s11432-019-2633-y
  18. Wind Turbine Prognosis Models Based on SCADA Data and Extreme Learning Machines vol.11, pp.2, 2011, https://doi.org/10.3390/app11020590
  19. Redundancy Is Not Necessarily Detrimental in Classification Problems vol.9, pp.22, 2011, https://doi.org/10.3390/math9222899