Reinterpretation of the protein identification process for proteomics data

Kwon, Kyung-Hoon;Lee, Sang-Kwang;Cho, Kun;Park, Gun-Wook;Kang, Byeong-Soo;Park, Young-Mok;

doi:10.4051/ibc.2009.3.0009

Interdisciplinary Bio Central

제1권3호
/
Pages.9.1-9.6
/
2009
/
2005-8543(eISSN)

한국생명정보학회 (Korean Society for Bioinformatics)

DOI QR Code

Reinterpretation of the protein identification process for proteomics data

Kwon, Kyung-Hoon (Division of Mass Spectrometry Research, Korea Basic Science Institute, Ochang, Chungbuk, Republic of Korea) ;
Lee, Sang-Kwang (Division of Mass Spectrometry Research, Korea Basic Science Institute, Ochang, Chungbuk, Republic of Korea) ;
Cho, Kun (Division of Mass Spectrometry Research, Korea Basic Science Institute, Ochang, Chungbuk, Republic of Korea) ;
Park, Gun-Wook (Division of Mass Spectrometry Research, Korea Basic Science Institute, Ochang, Chungbuk, Republic of Korea) ;
Kang, Byeong-Soo (The I-BIO graduate program and National Core Research Center for Systems Bio-Dynamics, POSTECH) ;
Park, Young-Mok (Division of Mass Spectrometry Research, Korea Basic Science Institute, Ochang, Chungbuk, Republic of Korea)

발행 : 2009.09.30

https://doi.org/10.4051/ibc.2009.3.0009 인용 PDF

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

Introduction: In the mass spectrometry-based proteomics, biological samples are analyzed to identify proteins by mass spectrometer and database search. Database search is the process to select the best matches to the experimental mass spectra among the amino acid sequence database and we identify the protein as the matched sequence. The match score is defined to find the matches from the database and declare the highest scored hit as the most probable protein. According to the score definition, search result varies. In this study, the difference among search results of different search engines or different databases was investigated, in order to suggest a better way to identify more proteins with higher reliability. Materials and Methods: The protein extract of human mesenchymal stem cell was separated by several bands by one-dimensional electrophorysis. One-dimensional gel was excised one by one, digested by trypsin and analyzed by a mass spectrometer, FT LTQ. The tandem mass (MS/MS) spectra of peptide ions were applied to the database search of X!Tandem, Mascot and Sequest search engines with IPI human database and SwissProt database. The search result was filtered by several threshold probability values of the Trans-Proteomic Pipeline (TPP) of the Institute for Systems Biology. The analysis of the output which was generated from TPP was performed. Results and Discussion: For each MS/MS spectrum, the peptide sequences which were identified from different conditions such as search engines, threshold probability, and sequence database were compared. The main difference of peptide identification at high threshold probability was caused by not the difference of sequence database but the difference of the score. As the threshold probability decreases, the missed peptides appeared. Conversely, in the extremely high threshold level, we missed many true assignments. Conclusion and Prospects: The different identification result of the search engines was mainly caused by the different scoring algorithms. Usually in proteomics high-scored peptides are selected and low-scored peptides are discarded. Many of them are true negatives. By integrating the search results from different parameter and different search engines, the protein identification process can be improved.

키워드

참고문헌

Alves, G., Wu, W.W., Wang, G., Shen, R.F. and Yu, Y.K. (2008) Enhancing peptide identification confidence by combining search methods. J. Proteome Res. 7(8), 3102-13 https://doi.org/10.1021/pr700798h
Craig, R., Beavis, R.C. (2004) TANDEM: matching proteins with tandem mass spectra. Bioinformatics 20, 1466-1467 https://doi.org/10.1093/bioinformatics/bth092.
Dancik, V., Addona, T.A., Clauser, K.R., Vath, J.E. and Pevzner, P.A. (1999) De Novo Peptide Sequencing via Tandem Mass Spectrometry. J. Comp. Biol. 6, 327-342 https://doi.org/10.1089/106652799318300
Elias, J.E., Gibbons, F.D., King, O.D., Roth, F.P. and Gygi, S.P. (2004) Intensity-based protein identification by machine learning from a library of tandem mass spectra. Nat. Biotech. 22, 214-219 https://doi.org/10.1038/nbt930
Eng, J.K., McCormack, A.L., Yates, JR III (1994) An Approach to Correlate Tandem Mass Spectral Data of Peptides with Amino Acid Sequences in a Protein Database. J. Am. Soc. Mass Spectrom 5, 976-989 https://doi.org/10.1016/1044-0305(94)80016-2
Eng, J.K., Fischer, B., Grossmann, J. and MacCoss, M.J. (2008) A Fast SEQUEST Cross Correlation Algorithm. J. Proteome Res. 7, 4598-4602 https://doi.org/10.1021/pr800420s
Geer, L.Y., Markey, S.P., Kowalak, J.A., Wagner, L., Xu, M., Maynard, D.M., Yang, X., Shi, W. and Bryant, S.H. (2004) Open mass spectrometry search algorithm, J. Proteome Res. 3(5), 958-64 https://doi.org/10.1021/pr0499491
Keller, A., Nesvizhskii, A.I., Kolker, E. and Aebersold, R. (2002) Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74, 5383-5392 https://doi.org/10.1021/ac025747h
Keller ,A., Eng, J., Zhang, N., Li, X. and Aebersold, R. (2005) A uniform proteomics MS/MS analysis platform utilizing open XML file formats. Mol. Sys. Biol. 2, 1-8 https://doi.org/10.1038/msb4100024
Nesvizhskii, A.I., Keller, A., Kolker, E. and Aebersold, R. (2003) A statistical model for identifying proteins by tandem mass spectrometry. Anal. Chem. 75, 4646-4658 https://doi.org/10.1021/ac0341261
Kapp, E.A., Schutz, F., Connolly, L.M., Chakel, J.A., Meza, J.E., Miller, C.A., Fenyo, D., Eng, J.K., Adkins, J.N., and Omenn, G,S. (2005) An evaluation, comparison, and accurate benchmarking of several publicly available MS/MS search algorithms: sensitivity and specificity analysis. Proteomics 5, 3475-90 https://doi.org/10.1002/pmic.200500126
Kersey, P.J., Duarte, J., Williams, A., Karavidopoulou, Y., Birney, E. and Apweiler, R. (2004) The International Protein Index : an integrated database for proteomics experiments, Proteomics, 4(7), 1985-8 https://doi.org/10.1002/pmic.200300721
O'Donovan, C., Martin, M.J., Gattiker, A., Gastelger, E., Bairoch, A. and Apweiler, R. (2002) High-quality protein knowledge resource: SWISS-PROT and TrEMBL. Brief Bioinform. 3(3), 275-84 https://doi.org/10.1093/bib/3.3.275
Omenn, G.S., States, D.J., et al. (2005) Overview of the HUPO Plasma Proteome Project: results from the pilot phase with 35 collaborating laboratories and multiple analytical groups, generating a core dataset of 3020 proteins and a publicly-available database, Proteomics 5(13), 3226-45 https://doi.org/10.1002/pmic.200500358
Perkins, D.N., Pappin, D.J.C., Creasy, D.M. and Cottrell, J.S. (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551-3567 https://doi.org/10.1002/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-2

Interdisciplinary Bio Central

Reinterpretation of the protein identification process for proteomics data

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)