A Scheme for Filtering SNPs Imputed in 8,842 Korean Individuals Based on the International HapMap Project Data

  • Lee, Ki-Chan (Department of Bioinformatics & Life Science, Soongsil University) ;
  • Kim, Sang-Soo (Department of Bioinformatics & Life Science, Soongsil University)
  • Published : 2009.06.30


Genome-wide association (GWA) studies may benefit from the inclusion of imputed SNPs into their dataset. Due to its predictive nature, the imputation process is typically not perfect. Thus, it would be desirable to develop a scheme for filtering out the imputed SNPs by maximizing the concordance with the observed genotypes. We report such a scheme, which is based on the combination of several parameters that are calculated by PLINK, a popular GWA analysis software program. We imputed the genotypes of 8,842 Korean individuals, based on approximately 2 million SNP genotypes of the CHB+JPT panel in the International HapMap Project Phase II data, complementing the 352k SNPs in the original Affymetrix 5.0 dataset. A total of 333,418 SNPs were found in both datasets, with a median concordance rate of 98.7%. The concordance rates were calculated at different ranges of parameters, such as the number of proxy SNPs (NPRX), the fraction of successfully imputed individuals (IMPUTED), and the information content (INFO). The poor concordance that was observed at the lower values of the parameters allowed us to develop an optimal combination of the cutoffs (IMPUTED${\geq}$0.9 and INFO${\geq}$0.9). A total of 1,026,596 SNPs passed the cutoff, of which 94,364 were found in both datasets and had 99.4% median concordance. This study illustrates a conservative scheme for filtering imputed SNPs that would be useful in GWA studies.


genome-wide association;HapMap;PLINK;SNP imputation


  1. de Bakker, P.I.W., Ferreira, M.A.R., Xioming, J., Neale, B.M., Raychaudhuri, S., and Voicht, B.F. (2008). Practical aspects of imputation-driven meta-analysis of genome-wide association studies. Hum. Mol. Genet. 17, R122-128
  2. Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M.A., Bender, D., Maller, J., Sklar, P., de Bakker, P.I.W., Daly, M.J., et al. (2007). PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559-575
  3. The International HapMap Consortium. (2003). The International HapMap Project. Nature 426, 789-796
  4. Ahn, S.M., Kim, T.H., Lee, S., et al. (2009). The first Korean genome sequence and analysis: full genome sequencing for a socio-ethnic group. Genome Res. published in advance
  5. Thorisson, G.A., Smith, A.V., Krishnan, L., and Stein, L.D. (2005). The International HapMap Project Web site. Genome Res. 15, 1591-1593
  6. Cho, Y.S., Go, M.J., Kim, Y.J., et al. (2009). A large-scale genome-wide association study of Asian populations uncover genetic factors influencing eight quantitative traits. Nat. Genet. 41, 527-534
  7. The International HapMap Consortium. (2005). A Haplotype Map of the Human Genome. Nature 437, 1299-1320
  8. Marchini, J., Howie, B., Myers, S., McVean, G., and Donnelly, P. (2007). A new multipoint method for genome-wide association studies by imputation of genotypes. Nat. Genet. 39, 906-913
  9. Xiong, M., and Jin, L. (2007). Association Studies of Complex Diseases. In Bioinformatics - From Genomes to Therapies Vol. 3, T. Lengauer, ed. (Wiley-VCH, Germany), pp.1375-1426

Cited by

  1. Recapitulation of two genomewide association studies on blood pressure and essential hypertension in the Korean population vol.55, pp.6, 2010,
  2. Comparison of Erythrocyte Traits Among European, Japanese and Korean vol.8, pp.3, 2010,
  3. GSA-SNP: a general approach for gene set analysis of polymorphisms vol.38, pp.suppl_2, 2010,