Thoroughbred Horse Single Nucleotide Polymorphism and Expression Database: HSDB

  • Lee, Joon-Ho (Genomic Informatics Center, Hankyong National University) ;
  • Lee, Taeheon (Department of Agricultural Biotechnology and Research Institute for Agriculture and Life Sciences, Seoul National University) ;
  • Lee, Hak-Kyo (Genomic Informatics Center, Hankyong National University) ;
  • Cho, Byung-Wook (Department of Animal Science, College of Life Sciences, Pusan National University) ;
  • Shin, Dong-Hyun (Department of Agricultural Biotechnology and Research Institute for Agriculture and Life Sciences, Seoul National University) ;
  • Do, Kyoung-Tag (Department of Equine Sciences, Sorabol College) ;
  • Sung, Samsun (C&K Genomics, Seoul National University Research) ;
  • Kwak, Woori (C&K Genomics, Seoul National University Research) ;
  • Kim, Hyeon Jeong (C&K Genomics, Seoul National University Research) ;
  • Kim, Heebal (Department of Agricultural Biotechnology and Research Institute for Agriculture and Life Sciences, Seoul National University) ;
  • Cho, Seoae (C&K Genomics, Seoul National University Research) ;
  • Park, Kyung-Do (Genomic Informatics Center, Hankyong National University)
  • Received : 2013.11.04
  • Accepted : 2014.06.21
  • Published : 2014.09.01


Genetics is important for breeding and selection of horses but there is a lack of well-established horse-related browsers or databases. In order to better understand horses, more variants and other integrated information are needed. Thus, we construct a horse genomic variants database including expression and other information. Horse Single Nucleotide Polymorphism and Expression Database (HSDB) ( provides the number of unexplored genomic variants still remaining to be identified in the horse genome including rare variants by using population genome sequences of eighteen horses and RNA-seq of four horses. The identified single nucleotide polymorphisms (SNPs) were confirmed by comparing them with SNP chip data and variants of RNA-seq, which showed a concordance level of 99.02% and 96.6%, respectively. Moreover, the database provides the genomic variants with their corresponding transcriptional profiles from the same individuals to help understand the functional aspects of these variants. The database will contribute to genetic improvement and breeding strategies of Thoroughbreds.




Supported by : Rural Development Administration


  1. St Laurent, G., D. Shtokalo, M. R. Tackett, Z. Yang, T. Eremina, C. Wahlestedt, S. U. Inchima, B. Seilheimer, T. A. McCaffrey, and P. Kapranov. 2012. Intronic RNAs constitute the major fraction of the non-coding RNA in mammalian cells. BMC Genomics 13:504.
  2. Trapnell, C., L. Pachter, and S. L. Salzberg. 2009. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25:1105-1111.
  3. Van Bakel, H., C. Nislow, B. J. Blencowe, and T. R. Hughes. 2010. Most "dark matter" transcripts are associated with known genes. PLoS Biol. 8:e1000371.
  4. Wade, C. M., E. Giulotto, S. Sigurdsson, M. Zoli, S. Gnerre, F. Imsland, T. L. Lear, D. L. Adelson, E. Bailey, R. R. Bellone, H. Blocker, O. Distl, R. C. Edgar, M. Garber, T. Leeb, E. Mauceli, J. N. MacLeod, M. C. Penedo, J. M. Raison, T. Sharpe, J. Vogel, L. Andersson, D. F. Antczak, T. Biagi, M. M. Binns, B. P. Chowdhary, S. J. Coleman, G. Della Valle, S. Fryc, G. Guerin, T. Hasegawa, E. W. Hill, J. Jurka, A. Kiialainen, G. Lindgren, J. Liu, E. Magnani, J. R. Mickelson, J. Murray, S. G. Nergadze, R. Onofrio, S. Pedroni, M. F. Piras, T. Raudsepp, M. Rocchi, K. H. Roed, O. A. Ryder, S. Searle, L. Skow, J. E. Swinburne, A. C. Syvanen, T. Tozaki, S. J. Valberg, M. Vaudin, J. R. White, M. C. Zody, Broad Institute Genome Sequencing, Platform, Broad Institute Whole Genome Assembly Team, E. S. Lander, and K. Lindblad-Toh. 2009. Genome sequence, comparative analysis, and population genetics of the domestic horse. Science 326:865-867.
  5. Li, H., B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, R. Durbin, and 1000 Genome Project Data Processing Subgroup. 2009. The sequence alignment/map format and SAMtools. Bioinformatics 25:2078-2079.
  6. McKenna, A., M. Hanna, E. Banks, A. Sivachenko, K. Cibulskis, A. Kernytsky, K. Garimella, D. Altshuler, S. Gabriel, M. Daly, and M. A. DePristo. 2010. The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20:1297-1303.
  7. Park, K. D., J. Park, J. Ko, B. C. Kim, H. S. Kim, K. Ahn, K. T. Do, H. Choi, H. M. Kim, S. Song, S. Lee, S. Jho, H. S. Kong, Y. M. Yang, B. H. Jhun, C. Kim, T. H. Kim, S. Hwang, J. Bhak, H. K. Lee, and B. W. Cho. 2012. Whole transcriptome analyses of six Thoroughbred horses before and after exercise using RNA-Seq. BMC Genomics 13:473.
  8. Petersen, J. L., J. R. Mickelson, A. K. Rendahl, S. J. Valberg, L. S. Andersson, J. Axelsson, E. Bailey, D. Bannasch, M. M. Binns, A. S. Borges, P. Brama, A. da Camara Machado, S. Capomaccio, K. Cappelli, E. G. Cothran, O. Distl, L. Fox-Clipsham, K. T. Graves, G. Guerin, B. Haase, T. Hasegawa, K. Hemmann, E. W. Hill, T. Leeb, G. Lindgren, H. Lohi, M. S. Lopes, B. A. McGivney, S. Mikko, N. Orr, M. C. Penedo, R. J. Piercy, M. Raekallio, S. Rieder, K. H. Roed, J. Swinburne, T. Tozaki, M. Vaudin, C. M. Wade, and M. E. McCue. 2013. Genome-wide analysis reveals selection for important traits in domestic horse breeds. PLoS Genet. 9:e1003211.
  9. Robinson, M. D., D. J. McCarthy, and G. K. Smyth. 2010. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26:139-140.
  10. Sherry, S. T., M. H. Ward, M. Kholodov, J. Baker, L. Phan, E. M. Smigielski, and K. Sirotkin. 2001. dbSNP: the NCBI database of genetic variation. Nucl. Acids Res. 29:308-311.
  11. Danecek, P., A. Auton, G. Abecasis, C. A. Albers, E. Banks, M. A. DePristo, R. E. Handsaker, G. Lunter, G. T. Marth, S. T. Sherry, G. McVean, R. Durbin, and Genomes Project Analysis. 2011. The variant call format and VCFtools. Bioinformatics 27:2156-2158.
  12. Gordon, J. 2001. The Horse Industry - Contributing to the Australian Economy. Rural Industries Research and Development Corporation, Canberra, Australia. 1-58.
  13. Hill, E. W., B. A. McGivney, J. Gu, R. Whiston, and D. E. MacHugh. 2010. A genome-wide SNP-association study confirms a sequence variant (g. 66493737C> T) in the equine myostatin (MSTN) gene as the most powerful predictor of optimum racing distance for Thoroughbred racehorses. BMC Genomics 11:552.
  14. Hubbard, T., D. Barker, E. Birney, G. Cameron, Y. Chen, L. Clark, T. Cox, J. Cuff, V. Curwen, T. Down, R. Durbin, E. Eyras, J. Gilbert, M. Hammond, L. Huminiecki, A. Kasprzyk, H. Lehvaslaiho, P. Lijnzaad, C. Melsopp, E. Mongin, R. Pettett, M. Pocock, S. Potter, A. Rust, E. Schmidt, S. Searle, G. Slater, J. Smith, W. Spooner, A. Stabenau, J. Stalker, E. Stupka, A. Ureta-Vidal, I. Vastrik, and M. Clamp. 2002. The Ensembl genome database project. Nucl. Acids Res. 30:38-41.
  15. Kapranov, P., G. St Laurent, T. Raz, F. Ozsolak, C. P. Reynolds, P. H. B. Sorensen, G. Reaman, P. Milos, R. J. Arceci, J. F. Thompson, and T. J. Triche. 2010. The majority of total nuclear-encoded non-ribosomal RNA in a human cell is 'dark matter' un-annotated RNA. BMC Biol. 8:149.
  16. Kim, H., T. Lee, W. Park, J. W. Lee, J. Kim, B. Y. Lee, H. Ahn, S. Moon, S. Cho, K. T. Do, H. S. Kim, H. K. Lee, C. K. Lee, H. S. Kong, Y. M. Yang, J. Park, H. M. Kim, B. C. Kim, S. Hwang, J. Bhak, D. Burt, K. D. Park, B. W. Cho, and H. Kim. 2013. Peeling back the evolutionary layers of molecular mechanisms responsive to exercise-stress in the skeletal muscle of the racing horse. DNA Res. 20:287-298.
  17. Langmead, B. and S. L. Salzberg. 2012. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9:357-359.
  18. Wetterbom, A., A. Ameur, L. Feuk, U. Gyllensten, and L. Cavelier. 2010. Identification of novel exons and transcribed regions by chimpanzee transcriptome sequencing. Genome Biol. 11:R78.
  19. Chowdhary, B. P. and T. Raudsepp. 2008. The Horse Genome Derby: racing from map to whole genome sequence. Chromosome Res. 16:109-127.
  20. Cingolani, P., A. Platts, L. Wang, M. Coon, T. Nguyen, S. J. Land, X. Lu, and D. M. Ruden. 2012. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 6:80-92.
  21. Ameur, A., A. Zaghlool, J. Halvardson, A. Wetterbom, U. Gyllensten, L. Cavelier, and L. Feuk. 2011. Total RNA sequencing reveals nascent transcription and widespread co-transcriptional splicing in the human brain. Nat. Struct. Mol. Biol. 18:1435-1440.
  22. Barrett, J., B. Fry, J. Maller, and M. Daly. 2005. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21:263-265.

Cited by

  1. Transcriptome Analysis Reveals Silver Nanoparticle-Decorated Quercetin Antibacterial Molecular Mechanism vol.9, pp.11, 2017,