DOI QR코드

DOI QR Code

Hybrid Feature Selection Method Based on Genetic Algorithm for the Diagnosis of Coronary Heart Disease

  • Received : 2021.10.07
  • Accepted : 2022.02.04
  • Published : 2022.03.31

Abstract

Coronary heart disease (CHD) is a comorbidity of COVID-19; therefore, routine early diagnosis is crucial. A large number of examination attributes in the context of diagnosing CHD is a distinct obstacle during the pandemic when the number of health service users is significant. The development of a precise machine learning model for diagnosis with a minimum number of examination attributes can allow examinations and healthcare actions to be undertaken quickly. This study proposes a CHD diagnosis model based on feature selection, data balancing, and ensemble-based classification methods. In the feature selection stage, a hybrid SVM-GA combined with fast correlation-based filter (FCBF) is used. The proposed system achieved an accuracy of 94.60% and area under the curve (AUC) of 97.5% when tested on the z-Alizadeh Sani dataset and used only 8 of 54 inspection attributes. In terms of performance, the proposed model can be placed in the very good category.

Keywords

Acknowledgement

We would like to thank the National Research and Innovation Agency of the Republic of Indonesia for providing research funding under the Basic Research Grant scheme (Contract Number: 221.1/UN27.22/HK.07.00/2021).

References

  1. S. Bae, S. R. Kim, N. Kim, W. J. Shim, and M. Park, "Impact of cardiovascular disease and risk factors on fatal outcomes in patients with COVID-19 according to age: a systematic review and metaanalysis," vol. 107, no. 5, pp. 373-380, 2021. https://doi.org/10.1136/heartjnl-2020-317901
  2. N. M. Hemphill, M. T. Y. Kuan, and K. C. Harris, "Reduced physical activity during COVID-19 pandemic in children with congenital heart disease," Canadian Journal of Cardiology, vol. 36, no. 2020, pp. 1130-1134, 2020. https://doi.org/10.1016/j.cjca.2020.04.038
  3. J. Kim, J. Lee, and Y. Lee, "Data-mining-based coronary heart disease risk prediction model using fuzzy logic and decision tree," Healthcare Informatics Research, vol. 21, no. 3, pp. 167-174, 2015, DOI: 10.4258/hir.2015.21.3.167.
  4. W. Wiharto, H. Kusnanto, and H. Herianto, "System diagnosis of coronary heart disease using a combination of dimensional reduction and data mining techniques: A review," Indonesian Journal of Electrical Engineering and Computer Science, vol. 7, no. 2, pp. 514-523, 2017, DOI: 10.11591/ijeecs.v7.i2.pp514-523.
  5. N. M. Khan, N. Madhav C, A. Negi, and I. S. Thaseen, "Analysis on improving the performance of machine learning models using feature selection technique," in Intelligent Systems Design and Applications, vol. 941, A. Abraham, A. K. Cherukuri, P. Melin, and N. Gandhi, Eds. Cham: Springer International Publishing, 2020, pp. 69-77. DOI: 10.1007/978-3-030-16660-1_7.
  6. K. P. Shroff and H. H. Maheta, "A comparative study of various feature selection techniques in high-dimensional data set to improve classification accuracy," in 2015 International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, Jan. 2015, pp. 1-6. DOI: 10.1109/ICCCI.2015.7218098.
  7. E. M. Karabulut, S. A. Ozel, and T. Ibrikci, "A comparative study on the effect of feature selection on classification accuracy," Procedia Technology, vol. 1, pp. 323-327, 2012, DOI: 10.1016/j.protcy.2012.02.068.
  8. Y. Zhao, Z. S.-Y. Wong, and K. L. Tsui, "a framework of rebalancing imbalanced healthcare data for rare events' classification: A case of look-alike sound-alike mix-up incident detection," Journal of Healthcare Engineering, vol. 2018, pp. 1-11, 2018, DOI: 10.1155/2018/6275435.
  9. S. Belarouci and M. A. Chikh, "Medical imbalanced data classification," Adv. Sci. Technol. Eng. Syst. J., vol. 2, no. 3, pp. 116-124, Apr. 2017, DOI: 10.25046/aj020316.
  10. B. A. Tama, S. Im, and S. Lee, "Improving an intelligent detection system for coronary heart disease using a two-tier classifier ensemble," BioMed Research International, vol. 2020, pp. 1-10, Apr. 2020, DOI: 10.1155/2020/9816142.
  11. Z. Arabasadi, R. Alizadehsani, M. Roshanzamir, H. Moosaei, and A. A. Yarifard, "Computer aided decision making for heart disease detection using hybrid neural network-Genetic algorithm," Computer Methods and Programs in Biomedicine, vol. 141, no. 2017, pp. 19-26, 2017, DOI: 10.1016/j.cmpb.2017.01.004.
  12. A. H. Shahid and M. P. Singh, "A Novel Approach for Coronary Artery Disease Diagnosis using Hybrid Particle Swarm Optimization based Emotional Neural Network," Biocybernetics and Biomedical Engineering, vol. 40, no. 4, pp. 1568-1585, Oct. 2020, DOI: 10.1016/j.bbe.2020.09.005.
  13. R. P. Cherian, N. Thomas, and S. Venkitachalam, "Weight optimized neural network for heart disease prediction using hybrid lion plus particle swarm algorithm," Journal of Biomedical Informatics, vol. 110, p. 103543, Oct. 2020, DOI: 10.1016/j.jbi.2020.103543.
  14. N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE: Synthetic minority over-sampling technique," Journal of Artificial Intelligence Research, vol. 16, no. 1, pp. 321-357, 2002, DOI: 10.1613/jair.953.
  15. E. Ramentol, Y. Caballero, R. Bello, and F. Herrera, "SMOTE-RSB *: A hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory," Knowledge and Information Systems, vol. 33, no. 2, pp. 245-265, 2012, DOI: 10.1007/s10115-011-0465-6.
  16. C. R. Olsen, R. J. Mentz, K. J. Anstrom, D. Page, and P. A. Patel, "Clinical applications of machine learning in the diagnosis, classification, and prediction of heart failure," American Heart Journal, vol. 229, pp. 1-17, Nov. 2020, DOI: 10.1016/j.ahj.2020.07.009.
  17. N. Kumar, N. N. Das, D. Gupta, K. Gupta, and J. Bindra, "Efficient Automated Disease Diagnosis Using Machine Learning Models," Journal of Healthcare Engineering, vol. 2021, pp. 1-13, 2021.
  18. P. Ghosh et al., "Efficient prediction of cardiovascular disease using machine learning algorithms with relief and lasso feature selection techniques," IEEE Access, vol. 9, pp. 19304-19326, 2021, DOI: 10.1109/ACCESS.2021.3053759.
  19. C. Krittanawong, "Machine learning prediction in cardiovascular diseases: A meta-analysis," Scientific Reports, vol. 2020, no. 10, pp. 1-11, 2020.
  20. L. Ashish, S. K. V, and S. Yeligeti, "Ischemic heart disease detection using support vector machine and extreme gradient boosting method," Materials Today: Proceedings, p. S2214785321008129, Feb. 2021, DOI: 10.1016/j.matpr.2021.01.715.
  21. M. M. Ghiasi, S. Zendehboudi, and A. A. Mohsenipour, "Decision tree-based diagnosis of coronary artery disease: CART model," Computer Methods and Programs in Biomedicine, vol. 192, pp. 1-14, Aug. 2020, DOI: 10.1016/j.cmpb.2020.105400.
  22. M. Zomorodi-moghadam, M. Abdar, Z. Davarzani, X. Zhou, P. Plawiak, and U. R. Acharya, "Hybrid particle swarm optimization for rule discovery in the diagnosis of coronary artery disease," Expert Systems, vol. 38, no. 1, Jan. 2021, DOI: 10.1111/exsy.12485.
  23. M. Abdar, W. Ksiazek, U. R. Acharya, R.-S. Tan, V. Makarenkov, and P. Plawiak, "A new machine learning technique for an accurate diagnosis of coronary artery disease," Computer Methods and Programs in Biomedicine, vol. 179, p. 104992, Oct. 2019, DOI: 10.1016/j.cmpb.2019.104992.
  24. A. G. Karegowda, A. S. Manjunath, G. Ratio, and C. F. Evaluation, "Comparative study of attribute selection using gain ratio," International Journal of Information Technology and Knowledge and Knowledge Management, vol. 2, no. 2, pp. 271-277, 2010.
  25. H. Djellali, S. Guessoum, N. Ghoualmi-Zine, and S. Layachi, "Fast correlation based filter combined with genetic algorithm and particle swarm on feature selection," in 2017 5th International Conference on Electrical Engineering - Boumerdes (ICEE-B), Boumerdes, pp. 1-6, Oct. 2017. DOI: 10.1109/ICEE-B.2017.8192090.
  26. E. P. Ephzibah, "Cost effective approach on feature selection using genetic algorithms and LS-SVM classifier," IJCA, vol. ecot, no. 1, pp. 16-20, Dec. 2010. DOI: 10.5120/1532-135.
  27. F. Z. Abdeldjouad, M. Brahami, and N. Matta, "A hybrid approach for heart disease diagnosis and prediction using machine learning techniques," in The Impact of Digital Technologies on Public Health in Developed and Developing Countries, vol. 12157, M. Jmaiel, M. Mokhtari, B. Abdulrazak, H. Aloulou, and S. Kallel, Eds. Cham: Springer International Publishing, pp. 299-306, 2020. DOI: 10.1007/978-3-030-51517-1_26.
  28. A. K. Shukla, P. Singh, and M. Vardhan, "A new hybrid feature subset selection framework based on binary genetic algorithm and information theory," Int. J. Comp. Intel. Appl., vol. 18, no. 03, p. 1950020, Sep. 2019, DOI: 10.1142/S1469026819500202.
  29. W. Wiharto, H. Herianto, and H. Kusnanto, "A tiered approach on dimensional reduction process for prediction of coronary heart disease," Indonesian Journal of Electrical Engineering and Computer Science, vol. 11, no. 2, pp. 487-495, 2018, DOI: 10.11591/ijeecs.v11.i2.
  30. L. Yu and H. Liu, "Feature selection for high-dimensional data: A fast correlation-based filter solution," in Proceedings, Twentieth International Conference on Machine Learning, Washington, DC, United States, pp. 856-863, Aug. 2003.
  31. N. Sanchez-Marono, A. Alonso-Betanzos, and M. Tombilla-Sanroman, "Filter methods for feature selection - A comparative study," in Intelligent Data Engineering and Automated Learning - IDEAL 2007, vol. 4881, H. Yin, P. Tino, E. Corchado, W. Byrne, and X. Yao, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2007, pp. 178-187. DOI: 10.1007/978-3-540-77226-2_19.
  32. Y. Khourdifi and M. Bahaj, "Feature selection with fast correlation-based filter for breast cancer prediction and classification using machine learning algorithms," in 2018 International Symposium on Advanced Electrical and Communication Technologies (ISAECT), Rabat, Morocco, pp. 1-6, Nov. 2018. DOI: 10.1109/ISAECT.2018.8618688.
  33. W. Xie, G. Liang, Z. Dong, B. Tan, and B. Zhang, "An improved oversampling algorithm based on the samples' selection strategy for classifying imbalanced data," Mathematical Problems in Engineering, vol. 2019, pp. 1-13, May 2019, DOI: 10.1155/2019/3526539.
  34. Y. -T. Kim, D. -K. Kim, H. Kim, and D. -J. Kim, "A comparison of oversampling methods for constructing a prognostic model in the patient with heart failure," in 2020 International Conference on Information and Communication Technology Convergence (ICTC), Jeju Island, Korea (South), pp. 379-383, Oct. 2020. DOI: 10.1109/ICTC49870.2020.9289522.
  35. J. Brandt and E. Lanzen, "A comparative review of SMOTE and ADASYN in imbalanced data classification," Dissertation, Uppsala University, Swedia, 2021.
  36. N. Matondang and N. Surantha, "Effects of oversampling SMOTE in the classification of hypertensive dataset," Adv. Sci. Technol. Eng. Syst. J., vol. 5, no. 4, pp. 432-437, Aug. 2020, DOI: 10.25046/aj050451.
  37. R. Blagus and L. Lusa, "SMOTE for high-dimensional class-imbalanced data," BMC Bioinformatics, vol. 14, no. 106, pp. 1-6, Dec. 2013, DOI: 10.1186/1471-2105-14-106.
  38. A. R. Purnajaya, W. A. Kusuma, and M. K. D. Hardhienata, "Performance comparison of data sampling techniques to handle imbalanced class on prediction of compound-protein interaction," Bio, vol. 8, no. 1, pp. 41-48, Jun. 2020, DOI: 10.24252/bio.v8i1.12002.
  39. R. Alizadehsani, M. J. Hosseini, Z. A. Sani, A. Ghandeharioun, and R. Boghrati, "Diagnosis of coronary artery disease using cost-sensitive algorithms," in Proceedings - 12th IEEE International Conference on Data Mining Workshops, ICDMW 2012, pp. 9-16, 2012, DOI: 10.1109/ICDMW.2012.29.
  40. R. Alizadehsani et al., "Non-invasive detection of coronary artery disease in high-risk patients based on the stenosis prediction of separate coronary arteries," Computer Methods and Programs in Biomedicine, vol. 162, pp. 119-127, 2018, DOI: 10.1016/j.cmpb.2018.05.009.
  41. R. Alizadehsani et al., "Diagnosis of coronary artery disease using data mining based on lab data and echo features," Journal of Medical and Bioengineering, vol. 1, no. 1, pp. 26-29, 2013, DOI: 10.12720/jomb.1.1.26-29.
  42. R. Alizadehsani et al., "Coronary artery disease detection using computational intelligence methods," Knowledge-Based Systems, vol. 109, pp. 187-197, Oct. 2016, DOI: 10.1016/j.knosys.2016.07.004.
  43. R. Detrano, A. Janosi, W. Steinbrunn, K. H. Guppy, S. Lee, and V. Froelicher, "International application of a new probability algorithm for the diagnosis of coronary artery disease," The American Journal of Cardiology, vol. 64, no. 5, pp. 304-310, 1989, DOI: 10.1016/0002-9149(89)90524-9.
  44. Y. Zhang, F. Liu, Z. Zhao, D. Li, X. Zhou, and J. Wang, "Studies on application of support vector machine in diagnose of coronary heart disease," 2012 6th International Conference on Electromagnetic Field Problems and Applications, ICEF'2012, 2012, DOI: 10.1109/ICEF.2012.6310380.
  45. R. Jing and Y. Zhang, "A view of support vector machines algorithm on classification problems," in 2010 International Conference on Multimedia Communications, TBD, TBD, Hong Kong, pp. 13-16, Aug. 2010. DOI: 10.1109/MEDIACOM.2010.21.
  46. K. Uyar and A. Ilhan, "Diagnosis of heart disease using genetic algorithm based trained recurrent fuzzy neural networks," Procedia Computer Science, vol. 120, pp. 588-593, 2017, DOI: 10.1016/j.procs.2017.11.283.
  47. B. Senliol, G. Gulgezen, L. Yu, and Z. Cataltepe, "Fast Correlation Based Filter (FCBF) with a different search strategy," in 2008 23rd International Symposium on Computer and Information Sciences, Istanbul, Turkey, pp. 1-4, Oct. 2008. DOI: 10.1109/ISCIS.2008.4717949.
  48. N. Landwehr, M. Hall, and E. Frank, "Logistic model trees," Machine Learning, vol. 59, pp. 161-205, 2005, DOI: 10.1007/s10994-005-0466-3.
  49. J. Lin, H. Chen, S. Li, Y. Liu, X. Li, and B. Yu, "Accurate prediction of potential druggable proteins based on genetic algorithm and Bagging-SVM ensemble classifier," Artificial Intelligence in Medicine, vol. 98, pp. 35-47, Jul. 2019, DOI: 10.1016/j.artmed.2019.07.005.
  50. M. C. Tu, D. Shin, and D. Shin, "A Comparative Study of Medical Data Classification Methods Based on Decision Tree and Bagging Algorithms," in 2009 Eighth IEEE International Conference on Dependable, Autonomic and Secure Computing, Chengdu, China, pp. 183-187, Dec. 2009. DOI: 10.1109/DASC.2009.40.
  51. M. N. Adnan and M. Z. Islam, "Forest PA : Constructing a decision forest by penalizing attributes used in previous trees," Expert Systems with Applications, vol. 89, pp. 389-403, Dec. 2017, DOI: 10.1016/j.eswa.2017.08.002.
  52. F. Gorunescu, "Data mining: Concepts, models, and techniques," Berlin, Heidelberg: Springer, 2011.
  53. R. Alizadehsani, J. Habibi, M. J. Hosseini, H. Mashayekhi, R. Boghrati, A. Ghandeharioun, B. Bahadorian, Z. A. Sani, "A data mining approach for diagnosis of coronary artery disease," Computer Methods and Programs in Biomedicine, vol. 111, no. 1, pp. 52-61, Jul. 2013, DOI: 10.1016/j.cmpb.2013.03.004.
  54. J. H. Joloudari, E. H. Joloudari, H. Saadatfar, M. Ghasemigol, S. M. Razavi, A. Mosavi, N. Nabipour, S. Shamshirband, and L. Nadai, "Coronary artery disease diagnosis; ranking the significant features using a random trees model," IJERPH, vol. 17, no. 3, p. 731, Jan. 2020, DOI: 10.3390/ijerph17030731.
  55. N. Jothi, W. Husain, N. Abdul Rashid, and S. M. Syed-Mohamad, "Feature selection method using genetic algorithm for medical dataset," International Journal on Advanced Science, Engineering and Information Technology, vol. 9, no. 6, p. 1907, Dec. 2019, DOI: 10.18517/ijaseit.9.6.10226.
  56. X. Luo, F. Lin, Y. Chen, S. Zhu, Z. Xu, Z. Huo, M. Yu, and J. Peng, "Coupling logistic model tree and random subspace to predict the landslide susceptibility areas with considering the uncertainty of environmental features," Sci Rep, vol. 9, no. 1, p. 15369, Dec. 2019, DOI: 10.1038/s41598-019-51941-z.
  57. J. O. Ogutu, H.-P. Piepho, and T. Schulz-Streeck, "A comparison of random forests, boosting and support vector machines for genomic selection," BMC Proc, vol. 5, no. S3, p. S11, Dec. 2011, DOI: 10.1186/1753-6561-5-S3-S11.
  58. M. Pramanik, R. Pradhan, P. Nandy, A. K. Bhoi, and P. Barsocchi, "Machine learning methods with decision forests for parkinson's detection," Applied Sciences, vol. 11, no. 2, p. 581, Jan. 2021, DOI: 10.3390/app11020581.
  59. C. Hu, W. Fan, J. -X. Du, and N. Bouguila, "A novel statistical approach for clustering positive data based on finite inverted Beta-Liouville mixture models," Neurocomputing, vol. 333, pp. 110-123, Mar. 2019, DOI: 10.1016/j.neucom.2018.12.066.
  60. P. Verma, V. K. Awasthi, and S. K. Sahu, "A novel design of classification of coronary artery disease using deep learning and data mining algorithms," Revue d'Intelligence Artificielle, vol. 35, no. 3, pp. 209-215, 2021. https://doi.org/10.18280/ria.350304