DOI QR코드

DOI QR Code

Toward High Utilization of Heterogeneous Computing Resources in SNP Detection

  • Lim, Myungeun (IT Convergence Technology Research Laboratory, ETRI) ;
  • Kim, Minho (IT Convergence Technology Research Laboratory, ETRI) ;
  • Jung, Ho-Youl (IT Convergence Technology Research Laboratory, ETRI) ;
  • Kim, Dae-Hee (IT Convergence Technology Research Laboratory, ETRI) ;
  • Choi, Jae-Hun (IT Convergence Technology Research Laboratory, ETRI) ;
  • Choi, Wan (IT Convergence Technology Research Laboratory, ETRI) ;
  • Lee, Kyu-Chul (Department of Computer Engineering, Chungnam National University)
  • Received : 2014.08.25
  • Accepted : 2015.02.03
  • Published : 2015.04.01

Abstract

As the amount of re-sequencing genome data grows, minimizing the execution time of an analysis is required. For this purpose, recent computing systems have been adopting both high-performance coprocessors and host processors. However, there are few applications that efficiently utilize these heterogeneous computing resources. This problem equally refers to the work of single nucleotide polymorphism (SNP) detection, which is one of the bottlenecks in genome data processing. In this paper, we propose a method for speeding up an SNP detection by enhancing the utilization of heterogeneous computing resources often used in recent high-performance computing systems. Through the measurement of workload in the detection procedure, we divide the SNP detection into several task groups suitable for each computing resource. These task groups are scheduled using a window overlapping method. As a result, we improved upon the speedup achieved by previous open source applications by a magnitude of 10.

Keywords

References

  1. M. Metzker, "Sequencing Technologies - the Next Generation," Nature Rev. Genetics, vol. 11, Jan. 2010, pp. 31-46. https://doi.org/10.1038/nrg2626
  2. R.M. Durbin et al., "A Map of Human Genome Variation from Population-Scale Sequencing," Nature 467, Oct. 2010, pp. 1061-1073. https://doi.org/10.1038/nature09534
  3. F.S. Collins and A.D. Barker, "Mapping the Cancer Genome," Sci. American 296, Mar. 2007, pp. 50-57.
  4. O. Harismendy et al., "Evaluation of Next-Generation Sequencing Platforms for Population Targeted Sequencing Studies," Genome Biol., vol. 10, Mar. 2009, pp. R32-R32.13. https://doi.org/10.1186/gb-2009-10-3-r32
  5. J. Wang et al., "The Diploid Genome Sequence of an Asian Individual," Nature 456, Nov. 6, 2009, pp. 60-65.
  6. H. Li et al., "The Sequence Alignment/Map (SAM) Format and SAMtools," Bioinformat., vol. 25, no. 16, 2009, pp. 2078-2079. https://doi.org/10.1093/bioinformatics/btp352
  7. R. Li et al., "SNP Detection for Massively Parallel Whole-Genome Resequencing," Genome Res., vol. 19, May 2009, pp. 1124-1132. https://doi.org/10.1101/gr.088013.108
  8. A. McKenna et al., "The Genome Analysis Toolkit: A Mapreduce Framework for Analyzing Next-Generation DNA Sequencing Data," Genome Res., vol. 20, July 2010, pp. 1297-1303. https://doi.org/10.1101/gr.107524.110
  9. M.A. DePristo et al., "A Framework for Variation Discovery and Genotyping Using Next-Generation DNA Sequencing Data," Nature Genetics, vol. 10, Apr. 2011, pp. 491-498.
  10. J. Dean and S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," Commun. ACM, no. 51, no. 1, Jan. 2008, pp. 107-113. https://doi.org/10.1145/1327452.1327492
  11. D.-H. Ko et al., "Construction and Rendering of Trimmed Blending Surfaces with Sharp Features on a GPU," ETRI J., vol. 33, no. 1, Feb. 2011, pp. 89-98. https://doi.org/10.4218/etrij.11.1510.0091
  12. S. Kim, M.-H. Kyung, and J.-H. Lee, "Relighting 3D Scenes with a Continuously Moving Camera," ETRI J., vol. 31, no. 4, Aug. 2009, pp. 429-437. https://doi.org/10.4218/etrij.09.0108.0696
  13. C. Angermuller, A. Biegert, and J. Soding, "Discriminative Modelling of Context-Specific Amino Acid Substitution Probabilities," Bioinformat., vol. 28, Oct. 2012, pp. 3240-3247. https://doi.org/10.1093/bioinformatics/bts622
  14. C.-M. Liu et al., "SOAP3: Ultra-Fast GPU-Based Parallel Alignment Tool for Short Reads," Bioinformat., vol. 28, no. 6, Jan. 2012, pp. 878-879. https://doi.org/10.1093/bioinformatics/bts061
  15. A.W. Goetz et al., "Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs - Part I: Generalized Born," J. Chem. Theory Comput., vol. 8, no. 5, Mar. 2012, pp. 1542-1555. https://doi.org/10.1021/ct200909j
  16. G. Guo et al., "GPU-Accelerated Adaptive Compression Framework for Genomics Data," IEEE Int. Conf. Big Data, Silicon Valley, CA, USA, Oct. 6-9, 2013, pp. 181-186.
  17. D.J. Hedges et al., "Exome Sequencing of a Multigenerational Human Pedigree," PLoS ONE, vol. 4, no. 12, Dec. 2009, e8232. https://doi.org/10.1371/journal.pone.0008232
  18. S.T. Sherry et al., "dbSNP: The NCBI Database of Genetic Variation," Nucleic Acid Res., vol. 29, no. 1, 2001, pp. 308-311. https://doi.org/10.1093/nar/29.1.308
  19. M. Lu et al., "GSNP: A DNA Single-Nucleotide Polymorphism Detection System with GPU Acceleration," Int. Conf. Parallel Process., Taipei, Taiwan, Sept. 13-16, 2011, pp. 592-601.
  20. B. Langmead et al., "Searching for SNPs with Cloud Computing," Genome Biol., vol. 10, Nov. 2009, R134. https://doi.org/10.1186/gb-2009-10-11-r134
  21. B. Langmead et al., "Ultrafast and Memory-Efficient Alignment of Short DNA Sequences to the Human Genome," Genome Biol., vol. 10, Mar. 2009, R25. https://doi.org/10.1186/gb-2009-10-3-r25
  22. H. Li and R. Durbin, "Fast and Accurate Short Read Alignment with Burrows-Wheeler Transform," Bioinformat., vol. 25, no. 14, May 2009, pp. 1754-1760. https://doi.org/10.1093/bioinformatics/btp324
  23. Picard Project. Accessed June 16, 2014. http://hpicard.sourceforge.net
  24. Personal Genome Institute. Accessed July 4, 2014. http://pgi.re.kr
  25. P.J.A. Cock et al., "The Sanger FASTQ File Format for Sequences with Quality Scores, and the Solexa/Illumina FASTQ Variants," Nucletic Acids Res., vol. 38, no. 6, 2010, pp. 1767-1771. https://doi.org/10.1093/nar/gkp1137
  26. Fast, Accurate and Easy Alignment and Variant Calling with Isaac Genome Alignment Software and Isaac Variant Caller, Illumina Inc. Accessed July 4, 2014. http://res.illumina.com/documents/products/hitepapers/whitepaper_iassc_workflow.pdf