DOI QR코드

DOI QR Code

Trend of Utilization of Machine Learning Technology for Digital Healthcare Data Analysis

디지털 헬스케어 데이터 분석을 위한 머신 러닝 기술 활용 동향

  • Published : 2019.02.01

Abstract

Machine learning has been applied to medical imaging and has shown an excellent recognition rate. Recently, there has been much interest in preventive medicine. If data are accessible, machine learning packages can be used easily in digital healthcare fields. However, it is necessary to prepare the data in advance, and model evaluation and tuning are required to construct a reliable model. On average, these processes take more than 80% of the total effort required. In this study, we describe the basic concepts of machine learning, pre-processing and visualization of datasets, feature engineering for reliable models, model evaluation and tuning, and the latest trends in popular machine learning frameworks. Finally, we survey a explainable machine learning analysis tool and will discuss the future direction of machine learning.

Keywords

HJTOCM_2019_v34n1_98_f0001.png 이미지

(그림 1) 디지털 스마트 헬스케어 패러다임

HJTOCM_2019_v34n1_98_f0002.png 이미지

(그림 2) 머신 러닝 분석 흐름

HJTOCM_2019_v34n1_98_f0003.png 이미지

(그림 3) (a) Breast cancer 데이터 속성과 (b) 가시화

HJTOCM_2019_v34n1_98_f0004.png 이미지

(그림 4) (a)분류 행렬과 (b)유방암 데이터의 분석(XGBoost적용) 사례

HJTOCM_2019_v34n1_98_f0005.png 이미지

(그림 5) 유방암 데이터의 모델 평가 지표

HJTOCM_2019_v34n1_98_f0006.png 이미지

(그림 6) 유방암 데이터의 모델 평가 결과의 ROC 곡선

HJTOCM_2019_v34n1_98_f0007.png 이미지

(그림 7) Breast Cancer 데이터[12]의 XGBoost 적용 분석 및 피쳐 중요도 ([29]로 구성)

HJTOCM_2019_v34n1_98_f0008.png 이미지

(그림 8) 유방암 데이터의 XGBoost분석에 대하여SHAP에 의한 예측요인 설명([33]의 재구성)

<표 1> 데이터 가시화 도구

HJTOCM_2019_v34n1_98_t0001.png 이미지

References

  1. 서경원 외, "스마트 헬스케어 의료기기 기술," 표준전략보고서, 식품의약품안전평가원, 2018. 8.
  2. 송영준, "4차 산업혁명과 디지털 헬스케어 정책," 주간기술동향, 2018. 2.
  3. 정성원, "Healthcare에서 빅데이터의 활용," 제 5회 임상연구 방법론 워크숍, 가톨릭의대의생명산업연구원, 서울, 2016. 11. 5, pp. 18-29.
  4. IBM, "Bigdata in Healthcare: Tapping New Insight to Save Lives," IBM Big Data & Analytics Hub, 2014. https://www.ibmbigdatahub.com/infographic/big-data-healthcare-tapping-new-insight-save-lives
  5. Wikipedia, "Machine Learning," https://en.wikipedia.org/wiki/Machine_learning
  6. 정일영, 구원모, "헬스케어생태계 구축을위한 데이터통합 방안," 동향과 이슈, 제46호, 2018. 1, pp. 1-38.
  7. MIT Critical Data, Secondary Analysis of Electronic Health Records, Springer International Publishing: NY, USA, 2016.
  8. G. Press, "Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says," Forbe, Mar. 23, 2016.
  9. S. Christa, V. Suma, and L. Maduri, "An Effective Data Preprocessing Technique for Improved Data Management in a Distributed Environment," ACCTHPCA, vol. 3, July 2012, pp. 25-29.
  10. SAS, "Data Visualization Techniques: From Basics to Big Data with SAS(R) Visual Analytics," SAS White Paper, 2018
  11. P. van der Laken, "Facet," Google, June 2017. https://github.com/PAIR-code/facets
  12. WIlliam H. Wolberg (physician), University of Wisconsin Hospitals. Madison, Wisconsin, USA, Breast Cancer Wisconsin (Original) Data Set, https://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+(original)
  13. Tutorials Point, "Seaborn," TutorialsPoint, 2017. https://www.tutorialspoint.com/seaborn/seaborn_tutorial.pdf
  14. A. Bilogur, "Missingno: A Missing Data Visualization Suite," J. Open Source Softw., Feb. 27, 2018, doi: 10.21105/joss.00547
  15. Continuum Analytics, "Blaze Documentation," 2018. https://blaze.readthedocs.io/en/latest/index.html
  16. G. Csardi and T. Nepusz, igraph Reference Manual, Harvard University: Cambridge, MA, USA, 2013.
  17. Wikipedia, "Feature Engineering," https://en.wikipedia.org/wiki/Feature_engineering
  18. A. Zheng, Evaluating Machine Learning Models, O'reilly: Sebastopol, CA, USA, 2015.
  19. Medcalc, "ROC Curve Analysis," https://www.medcalc.org/manual/roc-curves.php
  20. F.Y. Osisanwo et al., "Supervised Machine Learning Algorithms: Classification and Comparison," Int. J. Comput. Trends Technol., vol. 48, no. 3, June 2017, pp. 128-138. https://doi.org/10.14445/22312803/IJCTT-V48P126
  21. P. Harrington, Machine Learning in Action, Manning Publications Co.: Shelter Island, NY, USA, 2012, pp. 83-100.
  22. M. Namratha and T.R. Prajwala, "A Comprehensive Overview of Clustering Algorithms in Pattern Recognition," IOSR J. Comput. Eng., vol. 4, no. 6, 2012, pp. 23-30. https://doi.org/10.9790/0661-0462330
  23. L. Arnold et al., "An Introduction to Deep Learning," in Proc. Eur. Symp. Artif. Neural Netw., Bruges, Belgium, Apr. 27-29, 2011, pp. 477-488.
  24. Wikipedia, "Random Forest," https://en.wikipedia.org/wiki/Random_forest
  25. Wikipedia, "Boosting," https://en.wikipedia.org/wiki/Boosting_(machine_learning)
  26. R.E. Schapire, "The Boosting Approach to Machine Learning, An Overview," in MSRI Workshop on Nonlinear Estimation and Classification, Springer: Heidelberg, Germany, 2002, pp. 3-4.
  27. A. Natekin and A. Knoll, "Gradient Boosting Machines, a Tutorial," Front. Neurorobot., July 21, 2013, doi: 10.3389/fnbot.2013.00021.
  28. G. Biau, B. Cadre, and L. Rouviere, "Accelerated Gradient Boosting," arXiv:1803.02042, May 2018.
  29. J. Brownlee, "XGBoost with Python, Gradien Boosted Trees with XGBoost and Scikit-learn," Machine Learning Mastery, Sept. 19, 2016.
  30. G. Ke et al., "LGBM LightGBM: A Highly Efficient Gradient Boosting Decision Tree," Conf. Neural Inform. Process. Syst., Long Beach, CA, USA, 2017, pp. 1-9.
  31. A. Veronika, D.V. Ershov, and A. Guli, "CatBoost: Gradient Boosting with Categorical Features Support," Yandex, 2017. https://catboost.ai/
  32. M. Du, N. Liu, and X. Hu, "Techniques for Interpretable Machine Learning," arXiv:1808.00033, July 2018.
  33. M.T. Ribeiro, S. Singh, and C. Guestrin, "Why Should I Trust You?" Proc. ACM SIGKDD Int. Conf. Knowled. Discovery Data Mining, San Francisco, CA, USA, Aug. 13-17, 2016, pp. 1135-1144.
  34. S.M. Lundberg and S.-I. Lee, "A Unified Approach to Interpreting Model Predictions," Conf. Neural Inform. Process. Syst., Long Beach, CA, USA, 2017, pp. 1-10.
  35. A. Saabas, "treeinterpreter, 2015. https://github.com/andosa/treeinterpreter
  36. D. Foster, "xgboostExplainer," 2017. https://github.com/AppliedDataSciencePartners/xgboostExplainer