Advanced SearchSearch Tips
A comparative study of filter methods based on information entropy
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
A comparative study of filter methods based on information entropy
Kim, Jung-Tae; Kum, Ho-Yeun; Kim, Jae-Hwan;
  PDF(new window)
Feature selection has become an essential technique to reduce the dimensionality of data sets. Many features are frequently irrelevant or redundant for the classification tasks. The purpose of feature selection is to select relevant features and remove irrelevant and redundant features. Applications of the feature selection range from text processing, face recognition, bioinformatics, speaker verification, and medical diagnosis to financial domains. In this study, we focus on filter methods based on information entropy : IG (Information Gain), FCBF (Fast Correlation Based Filter), and mRMR (minimum Redundancy Maximum Relevance). FCBF has the advantage of reducing computational burden by eliminating the redundant features that satisfy the condition of approximate Markov blanket. However, FCBF considers only the relevance between the feature and the class in order to select the best features, thus failing to take into consideration the interaction between features. In this paper, we propose an improved FCBF to overcome this shortcoming. We also perform a comparative study to evaluate the performance of the proposed method.
Metaheuristics;Improved tabu search;Subset selection problem;
 Cited by
Performance evaluation of principal component analysis for clustering problems,;;;

한국마린엔지니어링학회지, 2016. vol.40. 8, pp.726-732 crossref(new window)
Performance evaluation of principal component analysis for clustering problems, Journal of the Korean Society of Marine Engineering, 2016, 40, 8, 726  crossref(new windwow)
M. Hall, "Correlation-based feature selection for machine learning", PhD thesis, Citeseer, 1999.

Z. Zhao, H. Liu, "Searching for interacting features," International Joint Conference on Artificial Intelligence, vol. 7, pp. 1156-1161, 2007.

I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, "Gene selection for cancer classification using support vector machines," Machine Learning, vol. 46, pp. 389-422, 2002. crossref(new window)

S. Maldonado, R. Weber, and J. Basak, "Simultaneous feature selection and classification using kernel-penalized support vector machines," Information Sciences, vol. 181 no.1, pp. 115-128, 2011. crossref(new window)

J. G. Bae, J. T. Kim, and J. H. Kim, "Subset selection in multiple linear regression: an improved tabu search," Journal of Korean Society of Marine Engineering, vol. 40, no. 2, pp. 138-145, 2016. crossref(new window)

I. Inza, B. Sierra, R. Blanco, and P. Larranaga, "Gene selection by sequential search wrapper approaches in microarray cancer class prediction," Journal of Intelligent and Fuzzy Systems, vol. 12, no. 1, pp. 25-33, 2002.

R. Ruiz, J. Riquelme, and J. Aguilar-Ruiz, "Incremental wrapper-based gene selection from microarray data for cancer classification," Pattern Recognition, vol. 39, no. 12, pp. 2383-2392, 2006. crossref(new window)

S. Shreem, S. Abdullah, M. Nazri, and M. Alzaqebah, "Hybridizing ReliefF, mRMR filters and GA wrapper approaches for gene selection," Journal of Theoretical and Applied Information Technology, vol. 46, no. 2, pp. 1034-1039, 2012.

L. Chuang, C. Yang, K. Wu, and C. Yang, "A hybrid feature selection method for DNA microarray data," Computers in Biology and Medicine, vol. 41, no. 4, pp. 228-237, 2011. crossref(new window)

W. Aiguo, A. Ning, C. Guilin, and L. Lian, "Hybridizing mRMR and harmony search for gene selection and classification of microarray data," Journal of Computational Information Systems, vol. 11, no. 5, pp. 1563-1570, 2015.

V. Vapnik, "Support-vector networks," Machine Learning, vol. 20, pp. 273-297, 1995.

J. Demsar, B. Zupan, M. W. Kattan, J. R. Beck, and I. Bratko, "Naive bayesian-based nomogram for prediction of prostate cancer recurrence," Studies in Health Technology and Informatics, vol. 68, pp. 436-441, 1999.

H. Sun, "A naive Bayes classifier for prediction of multidrug resistance reversal activity on the basis of atom typing," Journal of Medicinal Chemistry, vol. 48, no. 12, pp. 4031-4039, 2005. crossref(new window)

T. M. Cover and P. E. Hart, "Nearest neighbor pattern classification," IEEE Transactions on Information Theory, vol. 13, no. 1 pp. 21-27, 1967. crossref(new window)

J. N. Morgan and J. A. Sonquist, "Problems in the analysis of survey data, and a proposal," Journal of the American Statistical Association, vol. 58, no. 302, pp. 415-434, 1963. crossref(new window)

J. A. Hartigrn, Clustering Algorithms, Wiley, New York, 1975.

L.E. Raileanu and K. Stoffel, "Theoretical comparison between the Gini Index and information gain criteria," Annals of Mathematics and Artificial Intelligence, vol. 41 no. 1, pp. 77-93, 2004. crossref(new window)

M. Hall and L. Smith, "Practical feature subset selection for machine learning," Computer Science, Vol. 98, pp. 181-191, 1998

J. Yang, Y. Liu, Z. Liu, X. Zhu, and X. Zhang, "A new feature selection algorithm based on binomial hypothesis testing for spam filtering," Knowledge-Based Systems, vol. 24, no. 6, pp. 904-914, 2011. crossref(new window)

Q. Gu, Z. Li, and J. Han, "Generalized fisher score for feature selection," Proceedings of the International Conference on Uncertainty in Artificial Intelligence, 2011.

X. He, D. Cai, and P. Niyogi, "Laplacian score for feature selection," Advances in neural information processing systems, pp. 507-514, 2005.

K. Kira and L. Rendell, "The feature selection problem: traditional methods and a new algorithm," Proceedings of the Tenth National Conference on Artificial intelligence, AAAI Press, San Jose, CA, vol. 2, pp. 129-134. 1992.

L. Yu and H. Liu, "Feature selection for high-dimensional data: a fast correlation-based filter solution," Proceedings of the Twentieth International Conference on Machine Learning, vol. 3, pp. 856-863, 2003.

H. Peng, F. Long, and C. Ding, "Feature selection based on mutual information criteria of max-dependency, maxrelevance, and min-redundancy," IEEE Transactions on pattern analysis and machine intelligence, vol. 27, no. 8, pp. 1226-1238, 2005. crossref(new window)

J. R. Quinlan, "Induction of decision trees," Machine Learning, vol. 1, no. 1, pp. 81-106, 1986.

C. Ambroise and G. McLachlan, "Selection bias in gene extraction on the basis of microarray gene-expression data," proceedings of the National Academy of Sciences, vol. 99, no. 10, pp. 6562-6566, 2002. crossref(new window)

A. A. Alizadeh et al, "Distinct types of diffuse large B-cell lymphoma identitfied by gene expression profiling," Nature, vol. 403, no. 6769, pp. 503-511, 2000. crossref(new window)

U. Scherf et al, "A cDNA microarray gene expression database for the molecular pharmacology of cancer," vol. 24, no. 3, pp. 236-244, 2000. crossref(new window)

L. J. Vant't Veer et al, "Gene expression profiling predicts clinical outcome of breast cancer," Nature, vol. 415, no. 6871, pp. 530-536, 2002. crossref(new window)