DOI QR코드

DOI QR Code

A management Technique for Protein Version Information based on Local Sequence Alignment and Trigger

로컬 서열 정렬과 트리거 기반의 단백질 버전 정보 관리 기법

  • 정광수 (충북대학교 데이터베이스/바이오인포메틱스 연구실) ;
  • 박성희 (충북대학교 데이터베이스/바이오인포메틱스 연구실) ;
  • 류근호 (충북대학교 전기전자컴퓨터공학부)
  • Published : 2005.02.01

Abstract

After figuring out the function of an amino acid sequence, we can infer the function of the other amino acids that have similar sequence composition. Besides, it is possible that we alter protein whose function we know, into useful protein using genetic engineering method. In this process. an original protein amino sequence produces various protein sequences that have different sequence composition. Here, a systematic technique is needed to manage protein version sequences and reference data of those sequences. Thus, in this paper we proposed a technique of managing protein version sequences based on local sequence alignment and a technique of managing protein historical reference data using Trigger This method automatically determines the similarity between an original sequence and each version sequence while the protein version sequences are stored into database. When this technique is employed, the storage space that stores protein sequences is also reduced. After storing the historical information of protein and analyzing the change of protein sequence, we expect that a new useful protein and drug are able to be discovered based on analysis of version sequence.

하나의 아미노산 서열의 기능이 밝혀지면, 그와 유사한 서열 구조를 가지고 있는 서열의 기능도 유추해 낼 수 있다. 또한 기능이 밝혀진 단백질의 아미노산 서열을 변화시키거나 유용한 단백질을 만드는 것도 가능하다. 이 과정에서 하나의 원본 단백질 서열에 대하여 다른 서열 구성을 가지고 있는 여러 가지 단백질 서열이 생겨 날 수 있다. 여기서, 원본 단백질을 변화시켜 만든 단백질 버전 서열과 단백질의 주석정보를 저장 및 관리하는 체계적인 기법이 요구된다. 따라서 이 논문에서는 로컬 서열 정렬 기법을 적용한 단백질 아미노산 서열의 버전관리 기법과 트리거를 적용한 단백질 주석데이터의 이력 관리 기법을 제시하였다. 제안된 기법을 통하여 원본 서열과 버전서열의 유사도 측정 및 버전 관리의 자동화와 저장 공간을 감소시킬 수 있다. 또한 단백질 정보의 이력을 저장하고 서열 변화 정보를 분석하여 돌연변이 연구에 의한 유용한 단백질 개발 및 신약 개발이 가능하다.

Keywords

References

  1. 정광수, 이영화, 박성희, 류근호, '능동 데이터베이스 기반의 버전 서열 검출 메커니즘', 제7회 한국 과학기술 정보 인프라 워크샵 바이오인포메틱스 학술발표논문집, pp.249-259, 2002
  2. 박영미아, 유천권, 성원근, 이화중, 오희복, '유전자 재조합 탄저 성분백신 개발 연구(III) : 면역횟수에 따른 항체가 변동 및 방어항원 유전자 변이 분석', 국립보건원보, 제38권 The Report of National Institute of Health 38, pp.92-107, 2001
  3. Cathy, H., Wu, Lai-Su L. Yeh, Hongzhan Huang, Leslie Arminski. Jorge Castro-Alvear, Yongxing Chen, Zhang-Zhi Hu, Robert S. Ledley, Panagiotis Kourtesis Baris E. Suzek, C. R. Vinayaka, Jian Zhang, and Winona C. 'The Protein Information Resource,' Barker Nucleic Acids Research, Vol.31, pp.345-347, 2003 https://doi.org/10.1093/nar/gkg040
  4. David L. Wheeler, Deanna M. Church, Alex E. Lash, Detlef D. Leipe, Thomas L. Madden, Joan U. Pontius, Gregory D. Schuler, Lynn M. Schriml, Tatiana A. Tatusova, Lukas Wagner, and Barbara A. 'Rapp Database resources of the National Center for Biotechnology Information,' 2002 update Nucl. Acids, Res, Vol.30, pp.13-16, 2002 https://doi.org/10.1093/nar/30.1.13
  5. Dennis A Benson, Ilene Karsch-Mizrachi, David J. Lipman, James Ostell, Barbara A. Rapp, and David L. Wheeler 'GenBank' Nucl. Acids, Res, Vol.30, pp.17-20, 2002 https://doi.org/10.1093/nar/30.1.17
  6. Kwang Su Jung, Sung-Hee Park, Keun Ho Ryu, Hyeon S. Son, 'Sequence Version Management System based on Trigger,' Korean Society for Bioinformatics Annual Meeting, Vol.1, pp.134-141, 2002
  7. G. Stoesser, W. Baker, A. V.D Broek, E. Camon, M. Garcia-Pastor, C. Kanz, T. Kulikova, V. Lombard, R. Lopez, H. Parkinson, N. Redaschi, P. Sterk, P. Stoehr, M. Ann T., 'The EMBL nucleotide sequence database,' Nucl. Acids. Res, Vol.29, pp.17-21, 2001 https://doi.org/10.1093/nar/29.1.17
  8. Nabil R. Adam. Igg Adiwijaya, Terence Critchlow, Ron Musick, 'Detecting Data and Schema Changes in Scientific Documents,' ADL, pp.160-172, 2000 https://doi.org/10.1109/ADL.2000.848379
  9. Norman W. Paton: 'Active Rules in Database Systems' (Contents), Springer, New York. ISBN 0-387-98529-8, 1999
  10. T. A. Tatusova, L. Karsch-Mizrachi, J. A. Ostell: 'Complete genomes in WWW Entrez: data representation and analysis.' Bioinformatics, Vol.15, No.7, pp.536-543, 1999 https://doi.org/10.1093/bioinformatics/15.7.536
  11. Altschul, S. F. Madden. T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., Lipman, D. J. 'Gapped BLAST and PSI-BLAST: a new generation of protein database search programs,' Nucleic Acids Res, Vol.25, pp.3389-3402. 1997 https://doi.org/10.1093/nar/25.17.3389
  12. Sudarshan S. Chawathe, Anand Rnjaraman. Hector Garcia-Molina, Jennifer Widom, 'Change Detection in Hierarchically Structured Information,' SIGMOD Conference, pp.493-504, 1996 https://doi.org/10.1145/233269.233366
  13. I-Min A. Chen, Victor M. Markowitz, Stanley Letovsky, Peter Li, Kenneth H. Fasman, 'Version Management for Scientific Databases,' EDB, pp.289-303, 1996
  14. Douglis, F., Ball, T., 'Tracking and Viewing Changes on the Web,' In 1996 USENIX Technical Conference, 1996
  15. Douglis, F., Ball, T., Chen, Y., 'WebGUIDE: Querying and Navigating Changesin Web Repositories,' In Fifth international World Wide Web Conference, 1996 https://doi.org/10.1016/0169-7552(96)00059-1
  16. Jennifer Widom, Stefano Ceri, 'Introduction to Active Database Systems, Active Database Systems: Triggers and Rules For Advanced Database Processing', pp.1-41, 1996
  17. Zhang, K., Wang, J., Sasha, D., 'On the editing distance between undirected acyclic graphs,' In International Journal of Foundations of Computer Science, 1995
  18. Michael S. Paterson and Vlado Dancik, 'Longest Common Subsequences,' Mathematical Foundations of Computer Science, pp.127-142, 1994
  19. Wang, J., Zhang, K., Jeong, K., Shasha, D., 'A System for Approximate Tree Matching,' In IEEE Transaction On Knowledge and Data Engineering, Vol.6, pp,559-570, 1994 https://doi.org/10.1109/69.298173
  20. Henikoff, S., and Henikoff, J. G., Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, pp.10915-10919, 1992 https://doi.org/10.1073/pnas.89.22.10915
  21. Altschul, S. F. et al, 'Basic local alignment tool,' J. Mol, Biol., Vol.215, pp.403-10, 1990 https://doi.org/10.1016/S0022-2836(05)80360-2
  22. Sasha, D., Zhang, K., 'Fast algorithms for unit cost editing distance between trees,' In Journal of Algorithms, Vol.11, 1990 https://doi.org/10.1016/0196-6774(90)90011-3
  23. Masys, D. R., New directions in bioinformatics, J. Res. Natl. Inst. Stand. Tech. 94, pp.69-63, 1989
  24. Pearson, W. R., Lipman, D. J., 'Improved tool for Biological sequence comparison,' Proc Natl Acad Sci USA, Vol.85, pp.2444-2448, 1988 https://doi.org/10.1073/pnas.85.8.2444
  25. Abola, E. E., Bernstein, F. C., Bryant, S, H., Koetzle, T. F., and Weng, J., Protein data bank, In: Crystallographic databases information content, software systems, scientific applications, Allen, F.H., Bergerhoff, G., and Sievers, R., eds., Data Commission of the International Union of Crystallography, Cambridge, 1987
  26. Smith, T. F. and Waterman, M. S., Identification of common molecular sequences, J. Mol, Biol., pp.195-197, 1981 https://doi.org/10.1016/0022-2836(81)90087-5
  27. Hirschberg, D. S., 'Algorithms for the longest common subsequence problem,' In Journal of the ACM, pp.664-675, 1997 https://doi.org/10.1145/322033.322044
  28. Wagner, R., 'On the complexity of the extended string-to-string correction problem,' In seventh ACM symposium on the Theory of Computation, 1975 https://doi.org/10.1145/800116.803771
  29. Sellers, P. H., 'An algorithm for the distance between two finite sequences,' J. Comb. Th. A, Vol.16, pp.253-258, 1974 https://doi.org/10.1016/0097-3165(74)90050-8
  30. Needleman, S. B., Wunsch, C D., 'A general method applicable to the search for similarities in the amino acid sequence of two proteins,' J. Mol. Biol, Vol.48, pp.443-453, 1970 https://doi.org/10.1016/0022-2836(70)90057-4
  31. Bellman, R., 'Dynamic Programming,' Princeton University Press, 1957