DOI QR코드

DOI QR Code

Application of Data Mining for Biomedical Data Processing

바이오메디컬 데이터 처리를 위한 데이터마이닝 활용

  • Shon, Ho-Sun (Medical Research Institute, School of Medicine, Chungbuk National University) ;
  • Kim, Kyoung-Ok (Dept. of Nursing, Woosong College) ;
  • Cha, Eun-Jong (Dept. of Biomedical Engineering, School of Medicine, Chungbuk National University) ;
  • Kim, Kyung-Ah (Medical Research Institute, School of Medicine, Chungbuk National University)
  • Received : 2016.06.08
  • Accepted : 2016.06.15
  • Published : 2016.07.01

Abstract

Cancer has been the most frequent in Korea, and pathogenesis and progression of cancer have been known to be occurred through various causes and stages. Recently, the research of chromosomal and genetic disorder and the research about prognostic factor to predict occurrence, recurrence and progress of chromosomal and genetic disorder have been performed actively. In this paper, we analyzed DNA methylation data downloaded from TCGA (The Cancer Genome Atlas), open database, to research bladder cancer which is the most frequent among urinary system cancers. Using three level of methylation data which had the most preprocessing, 59 candidate CpG island were extracted from 480,000 CpG island, and then we analyzed extracted CpG island applying data mining technique. As a result, cg12840719 CpG island were analyzed significant, and in Cox's regression we can find the CpG island with high relative risk in comparison with other CpG island. Shown in the result of classification analysis, the CpG island which have high correlation with bladder cancer are cg03146993, cg07323648, cg12840719, cg14676825 and classification accuracy is about 76%. Also we found out that positive predictive value, the probability which predicts cancer in case of cancer was 72.4%. Through the verification of candidate CpG island from the result, we can utilize this method for diagnosing and treating cancer.

Keywords

References

  1. K. W. Jung, Y. J. Won, C. M. Oh, H. J. Kong, H. S. Cho, D. H. Lee, and K. H. Lee, "Prediction of cancer Incidence and mortality in Korea," Cancer Research and Treatment, vol. 47, no. 2, pp. 142-148, 2015. https://doi.org/10.4143/crt.2015.066
  2. J. K. Lee, "Genetic variation and diseases," worldscience.co.kr, 2015.
  3. http://cancergenome.nih.gov/
  4. Y. J. Kim, "Method for Diagnosis of Bladder Cancer using PRAC methylation and a use thereof" 10-2015-0026574, 2015.
  5. J. Han, and M. Kamber, Data Mining: Concepts and Techniques, Third Edition, The Morgan Kaufmann publishers, 2006.
  6. M. Lauss, M. Aine, G. Sjodahl, S. Veerla, O. Patschan, S. Gudjonsson, G. Chebil, K. Lovgren, M. Ferno, W. Mansson, F. Liedberg, M. Ringner, D. Lindgren, and M. Hoglund, "DNA methylation analyses of urothelial carcinoma reveal distinct epigenetic subtypes and an association between gene copy number and methylation status," Epigenetics, vol. 7, no. 8, pp. 858-867, 2012. https://doi.org/10.4161/epi.20837
  7. S. H. Cross, and A. P. Bird, "CpG islands and genes," Curr Opin Genet Dev, vol. 5, no. 3, pp. 309-314, 1995. https://doi.org/10.1016/0959-437X(95)80044-1
  8. M. Xiaou, Y. W. Wang, M. Q. Zhang, and A. F. Gazdar, "DNA methylation data analysis and its application to cancer research," Epigenomics, vol. 5, no. 3, pp. 301-316, 2013. https://doi.org/10.2217/epi.13.26
  9. J. K. Kim, and D. C. Suh, "Statistical Note on the Survival Analysis," Neurointervention, vol. 4, pp. 6-7, 2009.
  10. Z. D. Stephens, S. Y. Lee, F. Faghri, R. H. Campbell, C. Zhai, M. J. Efron, R. Iyer, M. C. Schatz, S Sinha, and G.E. Robinson, "Big Data: Astronomical or Genomical?," PLoS Biol, vol. 13, no. 7, e1002195, 2015. https://doi.org/10.1371/journal.pbio.1002195
  11. M. Kohl, Introduction to statistical data analysis with R, bookboon.com, London, 2015.
  12. J. Ye, T. Li, T. Xiong, and R. Janardan, "Using uncorrelated discriminant analysis for tissue classification with gene expression data," IEEE/ACM transactions on computation biology and bioinformatics, vol. 1, no. 4, pp. 181-190, 2004. https://doi.org/10.1109/TCBB.2004.45