DOI QR코드

DOI QR Code

Discovery-Driven Exploration Method in Lung Cancer 2-DE Gel Images Using the Data Cube

데이터 큐브를 이용한 폐암 2-DE 젤 이미지에서의 예외 탐사

  • 심정은 (연세대학교 컴퓨터과학과) ;
  • 이원석 (연세대학교 컴퓨터과학과)
  • Published : 2008.10.31

Abstract

In proteomics research, the identification of differentially expressed proteins observed under specific conditions is one of key issues. There are several ways to detect the change of a specific protein's expression level such as statistical analysis and graphical visualization. However, it is quiet difficult to handle the spot information of an individual protein manually by these methods, because there are a considerable number of proteins in a tissue sample. In this paper, using database and data mining techniques, the application plan of OLAP data cube and Discovery-driven exploration is proposed. By using data cubes, it is possible to analyze the relationship between proteins and relevant clinical information as well as analyzing the differentially expressed proteins by disease. We propose the measure and exception indicators which are suitable to analyzing protein expression level changes are proposed. In addition, we proposed the reducing method of calculating InExp in Discovery-driven exploration. We also evaluate the utility and effectiveness of the data cube and Discovery-driven exploration in the lung cancer 2-DE gel image.

단백질체학에서 특정 조건 하에서 단백질의 기능 이상 및 구조 변형 유무를 규명하고 질병 과정을 추적하는 것은 중요한 연구이다. 일반적으로 단백질의 발현량 변화 분석에는 통계적 방법이 많이 사용되고 있으며 단백질 상용 이미지 분석 소프트웨어에서 제공하는 그래픽을 이용한 방법들도 있으나, 이 방법들은 많은 조직 내에 존재하는 수많은 단백질을 수동으로 비교해야 하는 어려움이 있다. 본 논문에서는 데이터베이스와 데이터마이닝 기법을 이용하여 OLAP 데이터 큐브와 Discovery-driven 탐색의 응용 방법을 제안한다. 데이터 큐브의 특성을 이용함에 의해서, 질병에 의해 발현량이 변하는 단백질 뿐 아니라 임상적 특성과 단백질의 영향 관계를 분석하는 것이 가능하다. 데이터 큐브에서 단백질의 발현량 변화 분석에 적합한 데이터 큐브의 척도와Discovery-driven 탐색을 위한 예외 지표를 제안하고, 특히 In-exception을 계산하는데 있어서의 계산량 감소 방안을 제시한다. 실험을 통해 폐암 2-DE 데이터에서 데이터 큐브와 Discovery-driven 방법이 유용함을 보인다.

Keywords

References

  1. S. Y. Cho, K.-S. Park, J.E.Shim, M.-S.Kwon, K.H.Joo, W.S. Lee, J.Chang, H.Kim, H.C.Chung, H.O.Kim, Y.-K.Paik, An integrated proteome database for two-dimensional electrophoreses data analysis and laboratory information management system, Proteomics, 2, 1104-1113, 2002 https://doi.org/10.1002/1615-9861(200209)2:9<1104::AID-PROT1104>3.0.CO;2-Q
  2. Gygi, SP, Rist, B., Gerber, SA, Turecek, F., Gelb, MH and Aebersold, R., Quantitative Analysis of Complex Protein Mixtures Using Isotope-Coded Affinity Tags, Nat.Biotech, Vol.17, No.10, pp.994-9, 1999 https://doi.org/10.1038/13690
  3. Cagney, G. and Emili, A., De Novo Peptide Sequencing and Quantitative Profiling of Complex Protein Mixtures Using Mass-Coded Abundance Tagging, Nat Biotech., Vol.20, No. 2, 163-70, 2002 https://doi.org/10.1038/nbt0202-163
  4. Celis J. E., Rasmussen H. H., Gromov P., Olsen E., Madsen P., Leffers H., Honore B., Dejgaard K., Vorum H., Christensen D. B., $\{Phi}stergaard$ M., $\Hauns{varphi}$ A., Aagaard Jensen N., Celis A., Basse B., Lauridsen J. B., Ratz G. P., Andersen A. H., Walbum E., Kjaergaard I., Andersen I., Puype M., Van Damme J., Vandekerckhove J., The human keratinocyte two-dimensional protein database (update 1995): mapping components of signal transduction pathways, Electrophoresis, 16, 2177-2240, 1995 https://doi.org/10.1002/elps.11501601355
  5. Rabilloud, T., Two-dimensional gel electrophoresis in proteomics: Old, old fashioned, but it still climbs up the mountains., Proteomics, 2, 3-10, 2002 https://doi.org/10.1002/1615-9861(200201)2:1<3::AID-PROT3>3.0.CO;2-R
  6. K..S..Park, Y..K..Jeon, S..Y..Cho, D. B..Kim, W..S..Lee, M.-S. Kwon, H. Kim, E. S. Yu, Gao V., Patterson D., B.-D. Han, Y.-K.Paik, Composite Analyses of Metabolic Profiles of Proteins That are Differentially Expressed in Hepatocellular Carcinoma, HUPO-The Second Congress of Human Proteome Organization, Montreal, Canada, 2003
  7. S. O. Lim, S.-J. Park, W. Kim, S. G. Park, H.-J. Kim, Y. I. Kim, T.-S. Sohn, J.-H. Noh, G. Jung, Proteome Analysis of Hepatocellular Carcinoma, Biochemical and Biophysical Research Communications 291(4), 1031-1037, 2002 https://doi.org/10.1006/bbrc.2002.6547
  8. Boer J.M., Huber W.K., Sultmann H., Wilmer F., von Heydebreck A., Haas S., Korn B., Gunawa B., Vente A., Fuzesi L., Vingron M., Poustka A., Identification and classification of differentially expressed genes in renal cell carcinoma by expression profiling on a global human 31, 500-element cDNA array, Genome Research, 11(11), 1161-1170, 2001
  9. Arnott D., O'Connell K.L., King K.L., Stults J.T., An Integrated Approach to Proteome Analysis: Identification of Protein Associated with Cardiac Hypertrophy, Analytical Biochemistry, 258, 1-18, 1998 https://doi.org/10.1006/abio.1998.2566
  10. http://www.genebio.com/products/proteome_imaging.html
  11. http://www.nonlinear.com/products/progenesis/samespots/overview.asp
  12. Jane M.C.Oh, Brichory F., Puravs E., Kuick P., Wood C., Rouillard J.M., Tra J., Kandia S., Beer D., Hanash S., A database of protein expression in lung cancer, Proteomics, 1, 1303-1319, 2001 https://doi.org/10.1002/1615-9861(200110)1
  13. Agrawal R., Gupta A., Sarawagi S., Modeling multidimensional databases, Proc. of the 13th Int. Conference on Data Engineering, Birmingham, U.K., 1997 https://doi.org/10.1109/ICDE.1997.581777
  14. Sarawagi S., Agrawal R., Megiddo N., Discovery-driven Exploration of OLAP Data Cubes, Research Report RJ 10102(91918), IBM Almaden Research Center, January 1998
  15. Gray J., Chaudhuri S., Bosworth A., Layman A., Reichart D., Venkatrao M., Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals, Data Mining and Knowledge Discovery, 1, 29-53, 1997 https://doi.org/10.1023/A:1009726021843
  16. Jiawei Han and Micheline Kamber, DataMining: Concepts and Techniques, Morgan Kaufmann Publishers, 2000