DOI QR코드

DOI QR Code

Symbolic tree based model for HCC using SNP data

악성간암환자의 유전체자료 심볼릭 나무구조 모형연구

  • Lee, Tae Rim (Department of Information Statistics, Korea National Open University)
  • 이태림 (한국방송통신대학교 정보통계학과)
  • Received : 2014.07.21
  • Accepted : 2014.09.12
  • Published : 2014.09.30

Abstract

Symbolic data analysis extends the data mining and exploratory data analysis to the knowledge mining, we can suggest the SDA tree model on clinical and genomic data with new knowledge mining SDA approach. Using SDA application for huge genomic SNP data, we can get the correlation the availability of understanding of hidden structure of HCC data could be proved. We can confirm validity of application of SDA to the tree structured progression model and to quantify the clinical lab data and SNP data for early diagnosis of HCC. Our proposed model constructs the representative model for HCC survival time and causal association with their SNP gene data. To fit the simple and easy interpretation tree structured survival model which could reduced from huge clinical and genomic data under the new statistical theory of knowledge mining with SDA.

본 연구에서는 악성간암환자의 생존기간에 영향을 주는 인자를 찾기 위하여 반응변수를 악성간암 환자의 생존을 임상변수의 정보와 SNP유전인자를 통합한 자료를 대상으로 이해하기 쉬운 나무구조 생존모형과 심볼릭자료분석을 실시하여 영향을 주는 유의한 인자 뿐 아니라 그 임계치를 구하여 임상적으로 유용한 결과를 찾아 임상에 적용하는 것이 목적이다. 악성간암환자의 임상자료를 계량화하여 통계적 예후진단 모형을 구함으로써 임상변수 간 숨겨진 변수간의 관계를 규명하고 생존기간 군에 따른 예측 분류모형을 구하여 현시적으로 진단후 예후에 영향을 주는 중요 임상변수와 유전체변수 그 임계치를 구하여 임상에서의 치료계획에 중요한 근거를 제시했다. 심볼릭데이터 분석 결과 정상, 만성 간염, 간염, 악성간염 등의 4개 군으로 구성된 1840명의 대상자를 분석 5 유전체의 20개 SNP가 밝혀진 바 있다. 즉 IL10-ht2가 악성간암의 발병에 매위 강한 관련이 있고 TGFB L10P-Prosms가 만성 간염 환자 중 악성간암 발생 위험을 줄여주는 유전체로 밝혀졌다. SNP변수와 질병군의 컴셉트 변수에 따라 상관정도를 원의 반지름 길이로 상대적으로 나타내 줌으로써 가장 판별력 있는 심볼릭변수를 상대적으로 비교할 수 있었다. 임상자료와 유전체자료를 통합하여 심볼릭 나무구조 생존모형을 구하여 생존기간을 군으로 한 나무구조모형을 유의한 변수와 기준치와 함께 구할 수 있었다.

Keywords

References

  1. Afonso, F., Haddad, R., Toque, C., Eliezer E.-S. and Diday, E. (2013). User manual of the SYR software, Syrokko Internal Publication. Available from http://www.syrokko.com.
  2. Breiman, L. (2003). Manual for setting up, using, and understanding random forest V4.0. Available from http://oz.berkeley.edu/users/breiman/Using_random_forests_v4.0.pdf.
  3. Billard, L. and Diday, E. (2003). From the statistics of data to the statistic of knowledge: Symbolic data analysis. Journal of American Statistical Association, 98, 462.
  4. Billard, L. and Diday, E. (2006). Symbolic data analysis: Conceptual statistics and data mining, Wiley series in computational statistics, Wiley, Chichester.
  5. Diday, E. and Noirhomme-Fraiture, M. (2008). Symbolic data analysis and the SODAS software, Wiley, Chichester.
  6. Diday, E. (2010). Principal component analysis for categorical histogram data: Some open directions of research. In Classification and Multivariate Analysis for Complex Data Structures, edited by B. Fichet, D. Piccolo, R. Verde and M. Vichi, Springer Verlag, New York.
  7. Diday, E. (2012). Nonlinear canonical analysis for bar chart data tables and interpretation by coherency of meta bins and diversity of concepts. Proceedings of 3rd Workshop in Symbolic Data Analysis, 39-40.
  8. He, Y. (2006). Missing data imputation for tree-based models, Ph. D. Thesis, University of California at Los Angeles, CA.
  9. Kim, M. S., Lee, S. Y., Lee, T. R., Cho, W. H., Song, W. S., Cho, S. H., Lee, J. A., Yoo, J. Y., Jung, S. T. and Jeon, D. G. (2009). Prognostic effect of pathologic fracture in localized osteosarcoma: A cohort/case controlled study at a single institute. Journal of Surgical Oncology, 100, 233-239. https://doi.org/10.1002/jso.21265
  10. Lee, H. (2003) Searching for host genetic factors influencing the outcome of chronic HBV infection, especially the progression to hepato cellular carcinoma(HCC) by single nucleotide polymorphism (SNP) screening, Project Report, 21C Frontier Research & Development Region, Seoul.
  11. Lee, H. S., Kim, K. M., Yoon, J. H., Lee, T.R., Suh, K. S., Lee, K. U., Chung, J. W., Park, J. H. and Kim, C. Y. (2002). Theraputic efficacy of transcatheter aterial chemoembolization a compared with hepatic resection in hepatocellular carcinoma patients with compensated liver function in a hepatitisB virus-endemic area. Journal of Clinical Oncology, 20, 4459-4465. https://doi.org/10.1200/JCO.2002.02.013
  12. Lee, T. R. and Kim, M. J. and Myung, H. (2006). Independent prognostic factors of 861 cases of oral squamous cell carcinoma in Korean adults. Oral Oncology, 42, 208-217. https://doi.org/10.1016/j.oraloncology.2005.07.005
  13. Lee, T. R. and Moon, H. S. (1997). Classification of craniofacial patterns of children. The Journal of Korea Society of Oral Health, 21, 54-65.
  14. Lee, T. R. and Moon, H. S. (1998). Classification model for high risk dental caries with RBF neural networks. The Journal of Data Science and Classification, 2, 38-47.
  15. Lee, T. R. and Lee, H. S. (2009). Tree structured prognostic survival model for hepato cellular carcinoma using gene expression data. Journal of the Korean Society of Health Information and Health Statistics, 34, 73-83.
  16. Loh, W. Y. and Cho, H. (2006). Piecewise-constant tree-structured modeling for censored data. Applied Statistics (Korea University Institute of Statistics), 21, 31-53.
  17. Mballo, C., Asseraf M. and Diday E. (2004). Binary tree for interval and taxonomic variables. A Statistical Journal for Graduates Students, 5, 13-28.

Cited by

  1. The diffusion and policy options of the diagnostic imaging technologies in Korea vol.26, pp.1, 2015, https://doi.org/10.7465/jkdi.2015.26.1.179