DOI QR코드

DOI QR Code

Linear Mixed Models in Genetic Epidemiological Studies and Applications

선형혼합모형의 역할 및 활용사례: 유전역학 분석을 중심으로

  • Received : 2015.03.23
  • Accepted : 2015.03.30
  • Published : 2015.04.30

Abstract

We have experienced a substantial improvement in and cost-drop for genotyping that enables genetic epidemiological studies with large-scale genetic data. Genome-wide association studies have identified more than ten thousand causal variants. Many statistical methods based on linear mixed models have been developed for various goals such as estimating heritability and identifying disease susceptibility locus. Empirical results also repeatedly stress the importance of linear mixed models. Therefore, we review the statistical methods related with to linear mixed models and illustrate the meaning of their estimates.

지난 수십 년 동안 유전형 기술(genotyping technology)의 발달로 개인별 유전자 정보를 얻기 위해 필요한 비용이 감소함에 따라, 다양한 인간 질병의 원인 유전자를 규명하기 위한 많은 유전역학 연구들이 진행되어 왔다. 예를 들어 전장유전체관련분석(genome-wide association studies)은 수백 개에 이르는 표현형(phenotypes)에 대하여 수천 개에 이르는 원인유전자를 규명하였다. 유전체 자료의 홍수로 인하여 대규모 유전체 자료를 분석할 수 있는 다양한 분석 알고리즘에 개발되었으며, 특별히 선형혼합모형은 유전율의 추정부터 관련분석(association studies)에 이르기까지 유전역학 연구에서 광범위하게 활용되고 방법론이었다. 본 논문에서는 유전역학 연구에 있어 빈번하게 활용되는 선형혼합모형의 활용 사례를 나열하고, 각 분석 모형 별 추정치들의 생물학적 의미를 논하고자 한다.

Keywords

References

  1. Abecasis, G. R., Cherny, S. S., Cookson, W. O. and Cardon, L. R. (2002). Merlin-rapid analysis of dense genetic maps using sparse gene flow trees, Nature Genetics, 30, 97-101. https://doi.org/10.1038/ng786
  2. Almasy, L. and Blangero, J. (1998). Multipoint quantitative-trait linkage analysis in general pedigrees, American Journal of Human Genetics, 62, 1198-1211. https://doi.org/10.1086/301844
  3. Aulchenko, Y. S., de Koning, D. J. and Haley, C. (2007a). Genomewide rapid association using mixed model and regression: A fast and simple method for genomewide pedigree-based quantitative trait loci association analysis, Genetics, 177, 577-585. https://doi.org/10.1534/genetics.107.075614
  4. Aulchenko, Y. S., Ripke, S., Isaacs, A. and Van Duijn, C. M. (2007b). GenABEL: An R library for genome-wide association analysis, Bioinformatics, 23, 1294-1296. https://doi.org/10.1093/bioinformatics/btm108
  5. Chen, W. M. and Abecasis, G. R. (2006). Estimating the power of variance component linkage analysis in large pedigrees, Genet Epidemiol, 30, 471-484. https://doi.org/10.1002/gepi.20160
  6. Corbeil, R. R. and Searle, S. R. (1976). Restricted Maximum Likelihood (REML) Estimation of Variance Components in Mixed Model, Technometrics, 18, 31-38. https://doi.org/10.2307/1267913
  7. Elston, R. C. and Gray-McGuire, C. (2004). A review of the 'Statistical Analysis for Genetic Epidemiology' (S.A.G.E.) software package, Hum Genomics, 1, 456-459. https://doi.org/10.1186/1479-7364-1-6-456
  8. Falconer, D. S. (1989). Introduction to Quantitative Genetics, (3rd ed.), Burnt Mill, Harlow, Essex, England.
  9. George, E. I. and McCulloch, R. E. (1993). Variable selection via Gibbs sampling, Journal of the American Statistical Association, 88, 881-889. https://doi.org/10.1080/01621459.1993.10476353
  10. Gilmour, A. R., Thompson, R. and Cullis, B. R. (1995). Average information REML: An efficient algorithm for variance parameter estimation in linear mixed models, Biometrics, 51, 1440-1450. https://doi.org/10.2307/2533274
  11. Hindorff, L. A., Sethupathy, P., Junkins, H. A., Ramos, E. M., Mehta, J. P., Collins, F. S. and Manolio, T. A. (2009). Potential etiologic and functional implications of genome-wide association loci for human diseases and traits, Proceedings of the National Academy of Sciences of the United States of America, 106, 9362-9367. https://doi.org/10.1073/pnas.0903103106
  12. Kang, H. M., Sul, J. H., Service, S. K., Zaitlen, N. A., Kong, S. Y., Freimer, N. B., Sabatti, C. and Eskin, E. (2010). Variance component model to account for sample structure in genome-wide association studies, Nature Genetics, 42, 348-U110. https://doi.org/10.1038/ng.548
  13. Kang, H. M., Ye, C. and Eskin, E. (2008a). Accurate discovery of expression quantitative trait loci under confounding from spurious and genuine regulatory hotspots, Genetics, 180, 1909-1925. https://doi.org/10.1534/genetics.108.094201
  14. Kang, H. M., Zaitlen, N. A., Wade, C. M., Kirby, A., Heckerman, D., Daly, M. J. and Eskin, E. (2008b). Efficient control of population structure in model organism association mapping, Genomics, 178, 1709-1723.
  15. Kenward, M. G. and Roger, J. H. (1997). Small sample inference for fixed effects from restricted maximum likelihood, Biometrics, 53, 983-997. https://doi.org/10.2307/2533558
  16. Klein, R. J., Zeiss, C., Chew, E. Y., Tsai, J. Y., Sackler, R. S., Haynes, C., Henning, A. K., SanGiovanni, J. P., Mane, S. M., Mayne, S. T., Bracken, M. B., Ferris, F. L., Ott, J., Barnstable, C. and Hoh, J. (2005). Complement factor H polymorphism in age-related macular degeneration, Science, 308, 385-389. https://doi.org/10.1126/science.1109557
  17. Korte, A., Vilhjalmsson, B. J., Segura, V., Platt, A., Long, Q. and Nordborg, M. (2012). A mixed-model approach for genome-wide association studies of correlated traits in structured populations, Nature Genetics, 44, 1066-+. https://doi.org/10.1038/ng.2376
  18. Lee, S. H., Wray, N. R., Goddard, M. E. and Visscher, P. M. (2011). Estimating missing heritability for disease from genome-wide association studies, American Journal of Human Genetics, 88, 294-305. https://doi.org/10.1016/j.ajhg.2011.02.002
  19. Lim, J., Sung, J. and Won, S. (2014). Efficient strategy for the genetic analysis of related samples with a linear mixed model, Journal of the Korean Data and Information Science Society, 25, 1025-1038. https://doi.org/10.7465/jkdi.2014.25.5.1025
  20. Lippert, C., Listgarten, J., Liu, Y., Kadie, C. M., Davidson, R. I. and Heckerman, D. (2011). FaST linear mixed models for genome-wide association studies, Nature Methods, 8, 833-U894. https://doi.org/10.1038/nmeth.1681
  21. Listgarten, J., Kadie, C., Schadt, E. E. and Heckerman, D. (2010). Correction for hidden confounders in the genetic analysis of gene expression, Proceedings of the National Academy of Sciences of the United States of America, 107, 16465-16470. https://doi.org/10.1073/pnas.1002425107
  22. Lynch, M. and Walsh, B. (1998). Genetics and Analysis of Quantitative Traits, Sunderland, Mass.: Sinauer.
  23. Manolio, T. A., Collins, F. S., Cox, N. J., Goldstein, D. B., Hindorff, L. A., Hunter, D. J., McCarthy, M. I., Ramos, E. M., Cardon, L. R., Chakravarti, A., Cho, J. H., Guttmacher, A. E., Kong, A., Kruglyak, L., Mardis, E., Rotimi, C. N., Slatkin, M., Valle, D., Whittemore, A. S., Boehnke, M., Clark, A. G., Eichler, E. E., Gibson, G., Haines, J. L., Mackay, T. F., McCarroll, S. A. and Visscher, P. M. (2009). Finding the missing heritability of complex diseases, Nature, 461, 747-753. https://doi.org/10.1038/nature08494
  24. Martin, E. R., Bass, M. P., Hauser, E. R. and Kaplan, N. L. (2003). Accounting for linkage in family-based tests of association with missing parental genotypes, American Journal of Human Genetics, 73, 1016-1026. https://doi.org/10.1086/378779
  25. Ott, J. (1999). Analysis of Human Genetic Linkage, (3rd ed.), Baltimore: Johns Hopkins University Press.
  26. Ott, J., Kamatani, Y. and Lathrop, M. (2011). Family-based designs for genome-wide association studies, Nature Reviews Genetics, 12, 465-474.
  27. Ott, J., Schrott, H. G., Goldstei, J. l., Hazzard, W. R., Allen, F. H., Falk, C. T. and Motulsky, A. G. (1974). Linkage studies in a large kindred with familial hypercholesterolemia, American Journal of Human Genetics, 26, 598-603.
  28. Posthuma, D. and Boomsma, D. I. (2005). Mx scripts library: Structural equation modeling scripts for twin and family data, Behavior Genetics, 35, 499-505. https://doi.org/10.1007/s10519-005-2791-5
  29. Price, A. L., Patterson, N. J., Plenge, R. M., Weinblatt, M. E., Shadick, N. A. and Reich, D. (2006). Principal components analysis corrects for stratification in genome-wide association studies, Nature Genetics, 38, 904-909. https://doi.org/10.1038/ng1847
  30. Price, A. L., Zaitlen, N. A., Reich, D. and Patterson, N. (2010). New approaches to population stratification in genome-wide association studies, Nature Reviews Genetics, 11, 459-463.
  31. Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M. A. R., Bender, D., Sklar, P., de Bakker, P. I., Daly, M. J. and Sham, P. C. (2007). PLINK: A tool set for whole-genome association and populationbased linkage analyses, American Journal of Human Genetics, 81, 559-575. https://doi.org/10.1086/519795
  32. Rabinowitz, D. and Laird, N. (2000). A unified approach to adjusting association tests for population admixture with arbitrary pedigree structure and arbitrary missing marker information, Human Heredity, 50, 211-223. https://doi.org/10.1159/000022918
  33. Risch, N. and Merikangas, K. (1996). The future of genetic studies of complex human diseases, Science, 273, 1516-1517. https://doi.org/10.1126/science.273.5281.1516
  34. Smyth, G. K. and Verbyla, A. P. (1996). A conditional likelihood approach to residual maximum likelihood estimation in generalized linear models, Journal of the Royal Statistical Society Series B-Methodological, 58, 565-572.
  35. Tang, H., Quertermous, T., Rodriguez, B., Kardia, S. L. R., Zhu, X. F., Brown, A., Pankow, J. S., Province, M. A., Hunt, S. C., Boerwinkle, E., Schork, N. J. and Risch, N. J. (2005). Genetic structure, selfidentified race/ethnicity, and confounding in case-control association studies, American Journal of Human Genetics, 76, 268-275. https://doi.org/10.1086/427888
  36. Vattikuti, S., Guo, J. and Chow, C. C. (2012). Heritability and genetic correlations explained by common SNPs for metabolic syndrome traits, Plos Genetics, 8.
  37. Welter, D., MacArthur, J., Morales, J., Burdett, T., Hall, P., Junkins, H., Klemm, A., Flicek, P., Manolio, T., Hindorff, L. and Parkinson, H. (2014). The NHGRI GWAS Catalog, a curated resource of SNP-trait associations, Nucleic Acids Research, 42(D1), D1001-D1006. https://doi.org/10.1093/nar/gkt1229
  38. Yang, J. A., Benyamin, B., McEvoy, B. P., Gordon, S., Henders, A. K., Nyholt, D. R., Madden, P. A., Heath, A. C., Martin, N. G., Montgomery, G. W., Goddard, M. E. and Visscher, P. M. (2010). Common SNPs explain a large proportion of the heritability for human height, Nature Genetics, 42, 565-U131. https://doi.org/10.1038/ng.608
  39. Yang, J. A., Lee, S. H., Goddard, M. E. and Visscher, P. M. (2011). GCTA: A tool for genome-wide complex trait analysis, American Journal of Human Genetics, 88, 76-82. https://doi.org/10.1016/j.ajhg.2010.11.011
  40. Yu, J., Pressoir, G., Briggs, W. H., Vroh Bi, I., Yamasaki, M., Doebley, J. F., McMullen, M. D., Gaut, B. S., Nielsen, D. M., Holland, J. B., Kresovich, S. and Buckler, E. S. (2006). A unified mixed-model method for association mapping that accounts for multiple levels of relatedness, Nature Genetics, 38, 203-208. https://doi.org/10.1038/ng1702
  41. Zhang, Z. W., Ersoz, E., Lai, C. Q., Todhunter, R. J., Tiwari, H. K., Gore, M. A., Bradbury, P. J., Yu, J., Arnett, D. K., Ordovas, J. M. and Buckler, E. S. (2010). Mixed linear model approach adapted for genome-wide association studies, Nature Genetics, 42, 355-U118. https://doi.org/10.1038/ng.546
  42. Zhou, X. and Stephens, M. (2012). Genome-wide efficient mixed-model analysis for association studies, Nature Genetics, 44, 821-U136. https://doi.org/10.1038/ng.2310
  43. Zuk, O., Hechter, E., Sunyaev, S. R. and Lander, E. S. (2012). The mystery of missing heritability: Genetic interactions create phantom heritability, Proceedings of the National Academy of Sciences of the United States of America, 109, 1193-1198. https://doi.org/10.1073/pnas.1119675109