DOI QR코드

DOI QR Code

A New Algorithm of Reducing Candidate Haplotypes for Haplotype Inference

일배체형 추론을 위한 후보군 간소화 알고리즘

  • Choi, Mun-Ho (Department School of Electronics & Computer Engineering, Chonnam National University) ;
  • Kang, Seung-Ho (Division of Fusion Convergence of Mathematical Sciences, National Institute for Mathematical Sciences, KT Daeduk 2 Research Center) ;
  • Lim, Hyeong-Seok (Department School of Electronics & Computer Engineering, Chonnam National University)
  • Received : 2013.05.21
  • Accepted : 2013.06.25
  • Published : 2013.07.31

Abstract

The identification of haplotypes, which encode SNPs in a single chromosome, makes it possible to perform a haplotype-based association test with diseases. Given a set of genotypes from a population, the process of recovering the haplotypes that explain the genotypes is called haplotype inference. We propose a new preprocessing algorithm for the haplotype inference by pure parsimony (HIPP). The proposed algorithm excludes a large amount of redundant candidate haplotypes by detecting some groups of haplotypes that are dispensable for optimal solutions. For the well-known synthetic and biological data, the experimental results of our method show that our method run much faster than other preprocessing methods. After applying our preprocessing results, the numbers of haplotypes of HIPP solvers are equal to or slightly larger than that of optimal solutions.

인간의 한쪽 염색체상에 나타나는 SNP의 서열인 일배체형을 식별해내면 효과적인 유전질병 연관검사를 할 수 있다. 일배체형 추론문제란 특정 집단의 유전자형 집합으로부터 집단에 속한 각 개체의 유전자형을 설명할 수 있는 일배체형 집합을 도출해내는 것을 말한다. 본 논문에서는 검약기반 일배체형 추론 문제에 대해 최종 결과에 기여하지 않는 일배체형 집합을 후보군에서 제외함으로써 일배체형 추론과정에서 탐색해야 할 후보 일배체형의 개수를 줄이는 사전처리 알고리즘을 제시한다. 제시된 알고리즘은 기존의 사전처리 알고리즘에 비해 매우 빠르게 수행되며, 제시된 사전처리 알고리즘의 결과를 적용한 일배체형 추론은 대다수의 경우에 최적해를 산출하고, 최적해를 산출하지 않는 경우에도 최적해의 일배체형 개수와 크게 차이나지 않음을 실험을 통해서 보인다.

Keywords

References

  1. Wikipedia, http://en.wikipedia.org/wiki/Haplotype.
  2. C. Burgtorf, P. Kepper, M. Hoehe, C. Schmitt, R. Reinhardt, H. Leharch, and S. Sauer, "Clone-based systematic haplotyping (CSH): A procedure for physical haplotyping of whole genomes," Genome Res vol. 13 pp. 2717-2724, 2003. https://doi.org/10.1101/gr.1442303
  3. A. Graça, I. Lynce, J. Marques-Silva, and A.L. Oliveira, "Haplotype inference by Pure Parsimony: a survey," J Bioinform Comput Biol, vol. 17, pp. 969-992, 2010. https://doi.org/10.1089/cmb.2009.0101
  4. S. R. Browning and B. L. Browning, "Haplotype phasing: existing methods and new developments," Nature Reviews Genetics, vol. 12, pp. 703-714, 2011.
  5. G. Lancia, M. C. Pinotti, and R. Rizzi, "Haplotyping populations by pure parsimony: Complexity of exact and approximation algorithms," Informs Journal on Computing, vol.6 pp.348-359, 2004.
  6. A. Graça, Satisfiability-based Algorithms for Haplotype Inference, Ph.D. Thesis, Technical University of Lisbon, 2011.
  7. E. Irurozki, B. Calvo, and J. A. Lozano, "A preprocessing procedure for haplotype inference by pure parsimony," IEEE/ACM Trans Comput Biol Bioinform, vol.8 pp.1183- 1195, 2011. https://doi.org/10.1109/TCBB.2010.125
  8. M.-H. Choi, S.-H. Kang and H.-S. Lim, "A New Preprocessing Algorithm for Haplotype Inference: Reducing the Number of Candidate Haplotypes by Using Inclusion Relations between Groups of Haplotypes," J Bioinform Comput Biol, in progress.
  9. D. Gusfield, and S. H. Orzack, Haplotype inference, In: Aluru S (ed), Handbook on Bioinformatics, CRC Press, Boca Raton, pp. 1-28, 2005.
  10. A. Makhorin, GLPK (GNU Linear Programming Kit), http://www.gnu.org/software/glpk/
  11. A. Graca, J. Marques-Silva, I. Lynce, and A. L. Oliveira, "Haplotype inference with pseudo-Boolean optimization," Ann Oper Res vol. 184, pp. 137-162, 2011. https://doi.org/10.1007/s10479-009-0675-4