JOURNAL BROWSE
Search
Advanced SearchSearch Tips
Evaluation of the Redundancy in Decoy Database Generation for Tandem Mass Analysis
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
Evaluation of the Redundancy in Decoy Database Generation for Tandem Mass Analysis
Li, Honglan; Liu, Duanhui; Lee, Kiwook; Hwang, Kyu-Baek;
 
 Abstract
Peptide identification in tandem mass spectrometry is usually done by searching the spectra against target databases consisting of reference protein sequences. To control false discovery rates for high-confidence peptide identification, spectra are also searched against decoy databases constructed by permuting reference protein sequences. In this case, a peptide of the same sequence could be included in both the target and the decoy databases or multiple entries of a same peptide could exist in the decoy database. These phenomena make the protein identification problem complicated. Thus, it is important to minimize the number of such redundant peptides for accurate protein identification. In this regard, we examined two popular methods for decoy database generation: 'pseudo-shuffling' and 'pseudo-reversing'. We experimented with target databases of varying sizes and investigated the effect of the maximum number of missed cleavage sites allowed in a peptide (MC), which is one of the parameters for target and decoy database generation. In our experiments, the level of redundancy in decoy databases was proportional to the target database size and the value of MC, due to the increase in the number of short peptides (7 to 10 AA). Moreover, 'pseudo-reversing' always generated decoy databases with lower levels of redundancy compared to 'pseudo-shuffling'.
 Keywords
tandem mass spectrometry;peptide identification;protein identification;target databases;decoy databases;redundant peptides;
 Language
English
 Cited by
 References
1.
H. Steen and M. Mann, "The ABC's (and XYZ's) of peptide sequencing," Nature Reviews Molecular Cell Biology, Vol. 5, pp. 699-711, 2004. crossref(new window)

2.
A. Nesvizhskii, "A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics," Journal of Proteomics, Vol. 73, No. 11, pp. 2092-2123, 2010. crossref(new window)

3.
J. Elias and S. Gygi, "Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry," Nature Methods, Vol. 4, No. 3, pp. 207-214, 2007. crossref(new window)

4.
A. Nesvizhskii, "Proteogenomics: concepts, applications and computational strategies," Nature Methods, Vol. 11, pp. 1114-1125, 2014. crossref(new window)

5.
S. Woo, S. Cha, G. Merrihew, Y. He, N. Castellana, C. Guest, M. MacCoss, and V. Bafna, "Proteogenomic database construction driven from large scale RNA-seq data," Journal of Proteome Research, Vol. 13, No. 1, pp. 21-28, 2014. crossref(new window)