• Title/Summary/Keyword: GWAS

Search Result 188, Processing Time 0.031 seconds

MPI-GWAS: a supercomputing-aided permutation approach for genome-wide association studies

  • Paik, Hyojung;Cho, Yongseong;Cho, Seong Beom;Kwon, Oh-Kyoung
    • Genomics & Informatics
    • /
    • v.20 no.1
    • /
    • pp.14.1-14.4
    • /
    • 2022
  • Permutation testing is a robust and popular approach for significance testing in genomic research that has the advantage of reducing inflated type 1 error rates; however, its computational cost is notorious in genome-wide association studies (GWAS). Here, we developed a supercomputing-aided approach to accelerate the permutation testing for GWAS, based on the message-passing interface (MPI) on parallel computing architecture. Our application, called MPI-GWAS, conducts MPI-based permutation testing using a parallel computing approach with our supercomputing system, Nurion (8,305 compute nodes, and 563,740 central processing units [CPUs]). For 107 permutations of one locus in MPI-GWAS, it was calculated in 600 s using 2,720 CPU cores. For 107 permutations of ~30,000-50,000 loci in over 7,000 subjects, the total elapsed time was ~4 days in the Nurion supercomputer. Thus, MPI-GWAS enables us to feasibly compute the permutation-based GWAS within a reason-able time by harnessing the power of parallel computing resources.

Performance Comparison of Two Gene Set Analysis Methods for Genome-wide Association Study Results: GSA-SNP vs i-GSEA4GWAS

  • Kwon, Ji-Sun;Kim, Ji-Hye;Nam, Doug-U;Kim, Sang-Soo
    • Genomics & Informatics
    • /
    • v.10 no.2
    • /
    • pp.123-127
    • /
    • 2012
  • Gene set analysis (GSA) is useful in interpreting a genome-wide association study (GWAS) result in terms of biological mechanism. We compared the performance of two different GSA implementations that accept GWAS p-values of single nucleotide polymorphisms (SNPs) or gene-by-gene summaries thereof, GSA-SNP and i-GSEA4GWAS, under the same settings of inputs and parameters. GSA runs were made with two sets of p-values from a Korean type 2 diabetes mellitus GWAS study: 259,188 and 1,152,947 SNPs of the original and imputed genotype datasets, respectively. When Gene Ontology terms were used as gene sets, i-GSEA4GWAS produced 283 and 1,070 hits for the unimputed and imputed datasets, respectively. On the other hand, GSA-SNP reported 94 and 38 hits, respectively, for both datasets. Similar, but to a lesser degree, trends were observed with Kyoto Encyclopedia of Genes and Genomes (KEGG) gene sets as well. The huge number of hits by i-GSEA4GWAS for the imputed dataset was probably an artifact due to the scaling step in the algorithm. The decrease in hits by GSA-SNP for the imputed dataset may be due to the fact that it relies on Z-statistics, which is sensitive to variations in the background level of associations. Judicious evaluation of the GSA outcomes, perhaps based on multiple programs, is recommended.

Genome-wide Association Study (GWAS) and Its Application for Improving the Genomic Estimated Breeding Values (GEBV) of the Berkshire Pork Quality Traits

  • Lee, Young-Sup;Jeong, Hyeonsoo;Taye, Mengistie;Kim, Hyeon Jeong;Ka, Sojeong;Ryu, Youn-Chul;Cho, Seoae
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.28 no.11
    • /
    • pp.1551-1557
    • /
    • 2015
  • The missing heritability has been a major problem in the analysis of best linear unbiased prediction (BLUP). We introduced the traditional genome-wide association study (GWAS) into the BLUP to improve the heritability estimation. We analyzed eight pork quality traits of the Berkshire breeds using GWAS and BLUP. GWAS detects the putative quantitative trait loci regions given traits. The single nucleotide polymorphisms (SNPs) were obtained using GWAS results with p value <0.01. BLUP analyzed with significant SNPs was much more accurate than that with total genotyped SNPs in terms of narrow-sense heritability. It implies that genomic estimated breeding values (GEBVs) of pork quality traits can be calculated by BLUP via GWAS. The GWAS model was the linear regression using PLINK and BLUP model was the G-BLUP and SNP-GBLUP. The SNP-GBLUP uses SNP-SNP relationship matrix. The BLUP analysis using preprocessing of GWAS can be one of the possible alternatives of solving the missing heritability problem and it can provide alternative BLUP method which can find more accurate GEBVs.

BioSMACK: a linux live CD for genome-wide association analyses

  • Hong, Chang-Bum;Kim, Young-Jin;Moon, Sang-Hoon;Shin, Young-Ah;Go, Min-Jin;Kim, Dong-Joon;Lee, Jong-Young;Cho, Yoon-Shin
    • BMB Reports
    • /
    • v.45 no.1
    • /
    • pp.44-46
    • /
    • 2012
  • Recent advances in high-throughput genotyping technologies have enabled us to conduct a genome-wide association study (GWAS) on a large cohort. However, analyzing millions of single nucleotide polymorphisms (SNPs) is still a difficult task for researchers conducting a GWAS. Several difficulties such as compatibilities and dependencies are often encountered by researchers using analytical tools, during the installation of software. This is a huge obstacle to any research institute without computing facilities and specialists. Therefore, a proper research environment is an urgent need for researchers working on GWAS. We developed BioSMACK to provide a research environment for GWAS that requires no configuration and is easy to use. BioSMACK is based on the Ubuntu Live CD that offers a complete Linux-based operating system environment without installation. Moreover, we provide users with a GWAS manual consisting of a series of guidelines for GWAS and useful examples. BioSMACK is freely available at http://ksnp.cdc.go.kr/biosmack.

A Differential Privacy Approach to Preserve GWAS Data Sharing based on A Game Theoretic Perspective

  • Yan, Jun;Han, Ziwei;Zhou, Yihui;Lu, Laifeng
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.3
    • /
    • pp.1028-1046
    • /
    • 2022
  • Genome-wide association studies (GWAS) aim to find the significant genetic variants for common complex disease. However, genotype data has privacy information such as disease status and identity, which make data sharing and research difficult. Differential privacy is widely used in the privacy protection of data sharing. The current differential privacy approach in GWAS pays no attention to raw data but to statistical data, and doesn't achieve equilibrium between utility and privacy, so that data sharing is hindered and it hampers the development of genomics. To share data more securely, we propose a differential privacy preserving approach of data sharing for GWAS, and achieve the equilibrium between privacy and data utility. Firstly, a reasonable disturbance interval for the genotype is calculated based on the expected utility. Secondly, based on the interval, we get the Nash equilibrium point between utility and privacy. Finally, based on the equilibrium point, the original genotype matrix is perturbed with differential privacy, and the corresponding random genotype matrix is obtained. We theoretically and experimentally show that the method satisfies expected privacy protection and utility. This method provides engineering guidance for protecting GWAS data privacy.

GWAS of Salt Tolerance and Drought Tolerance in Korean Wheat Core Collection

  • Ji Yu Jeong;Kyeong Do Min;Jae Toon Kim
    • Proceedings of the Korean Society of Crop Science Conference
    • /
    • 2022.10a
    • /
    • pp.195-195
    • /
    • 2022
  • Abiotic stress is a major problem in global agriculture as it negatively affects crop growth, yield, and quality. Wheat (Triticum aestivum) is the world's second-highest-producing food resource, so the importance of mitigating damage caused by abiotic stress has been emerging. In this study, we performed GWAS to search for SNPs associated with salt tolerance and drought tolerance. NaCl (200 mM) treatment was performed at the seedling stage using 613 wheat varieties in Korean wheat core collection. Root length, root surface area, root average diameter, and root volume were measured. Drought stress was applied at the seedling stage, and the above phenotypes were measured. GW AS was performed for each phenotype data using the MLM, MLMM, and FarmCPU models. The best salt-tolerant wheat varieties were 'MK2402', 'Gyeongnam Geochang-1985-3698', and 'Milyang 13', showing superior root growth. The significant SNP AX-94704125 (BA00756838) were identified in all models. The genes closely located to the significant SNP were searched within ± 250 kb of the corresponding SNP. A total of 11 genes were identified within the region. NB-ARC involved in the defense response, FKSI involved in cell wall biosynthesis, and putative BP Ml involved in abiotic stress responses were discovered in the 11 genes. The best drought-tolerant wheat varieties were 'PI 534284', 'Moro of Sind', and 'CM92354-33M-0Y-0M-6Y-0B-0BGD', showing superior root growth. This study discovered SNPs associated with salt tolerance in Korean wheat core collection through GWAS. GWAS of drought tolerance is now proceeding, and the GWAS results will be represented on a poster. The SNPs identified by GWAS can be useful for studying molecular mechanisms of salt tolerance and drought tolerance in wheat.

  • PDF

Comparison of genome-wide association and genomic prediction methods for milk production traits in Korean Holstein cattle

  • Lee, SeokHyun;Dang, ChangGwon;Choy, YunHo;Do, ChangHee;Cho, Kwanghyun;Kim, Jongjoo;Kim, Yousam;Lee, Jungjae
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.32 no.7
    • /
    • pp.913-921
    • /
    • 2019
  • Objective: The objectives of this study were to compare identified informative regions through two genome-wide association study (GWAS) approaches and determine the accuracy and bias of the direct genomic value (DGV) for milk production traits in Korean Holstein cattle, using two genomic prediction approaches: single-step genomic best linear unbiased prediction (ss-GBLUP) and Bayesian Bayes-B. Methods: Records on production traits such as adjusted 305-day milk (MY305), fat (FY305), and protein (PY305) yields were collected from 265,271 first parity cows. After quality control, 50,765 single-nucleotide polymorphic genotypes were available for analysis. In GWAS for ss-GBLUP (ssGWAS) and Bayes-B (BayesGWAS), the proportion of genetic variance for each 1-Mb genomic window was calculated and used to identify informative genomic regions. Accuracy of the DGV was estimated by a five-fold cross-validation with random clustering. As a measure of accuracy for DGV, we also assessed the correlation between DGV and deregressed-estimated breeding value (DEBV). The bias of DGV for each method was obtained by determining regression coefficients. Results: A total of nine and five significant windows (1 Mb) were identified for MY305 using ssGWAS and BayesGWAS, respectively. Using ssGWAS and BayesGWAS, we also detected multiple significant regions for FY305 (12 and 7) and PY305 (14 and 2), respectively. Both single-step DGV and Bayes DGV also showed somewhat moderate accuracy ranges for MY305 (0.32 to 0.34), FY305 (0.37 to 0.39), and PY305 (0.35 to 0.36) traits, respectively. The mean biases of DGVs determined using the single-step and Bayesian methods were $1.50{\pm}0.21$ and $1.18{\pm}0.26$ for MY305, $1.75{\pm}0.33$ and $1.14{\pm}0.20$ for FY305, and $1.59{\pm}0.20$ and $1.14{\pm}0.15$ for PY305, respectively. Conclusion: From the bias perspective, we believe that genomic selection based on the application of Bayesian approaches would be more suitable than application of ss-GBLUP in Korean Holstein populations.

Single-trait GWAS of Leaf Rolling Index with the Korean Rice Germplasm

  • ByeongYong Jeong;Muhyun Kim;Tae-Ho Ham;Seong-Gyu Jang;Ah-Rim Lee;Min young Song;Soon-Wook Kwon;Joohyun Lee
    • Proceedings of the Korean Society of Crop Science Conference
    • /
    • 2022.10a
    • /
    • pp.17-17
    • /
    • 2022
  • Leaves are an important organism for photosynthesis and transpiration. The shape of leaf is crucial factor affecting plant architecture. V-shape leaf rolling is enhancing canopy photosynthesis by increasing the CO2 penetration and the light capture by reducing the shadow between the leaves. Therefore, moderate leaf rolling is thought to more high grain yield per area than flat leaf. We investigated 278 KRICE_CORE accession's Adaxial Leaf Rolling Index (LRI) in first heading using the following equation. For each accession, genomic DNA was used for sequencing. We sequenced the genomics with ~8 X coverage to detect SNPS. Raw reads were aligned against the rice reference (IRGSP 1.0) for SNP identification and genotype calling. To generate genotype data for GWAS, SNPs were filtered with minor allele frequency 0.05. Finally, 841,134 high-quality SNPs were used for our GWAS. The significant threshold was -log10(P)>7.23. From the results, 2 significance SNP were detected. Considering the LD block of 250kbp, 60 candidate gene were selected including Hypothetical gene and Conserved gene. In this poster, we analyzed candidate gene affecting adaxial Leaf Rolling through single-trait GWAS.

  • PDF

Single-trait GWAS of Leaf Rolling Index with the Korean Rice Germplasm

  • ByeongYong Jeong;Muhyun Kim;Tae-Ho Ham;Seong-Gyu Jang;Ah-Rim Lee;Min young Song;Soon-Wook Kwon;Joohyun Lee
    • Proceedings of the Korean Society of Crop Science Conference
    • /
    • 2022.10a
    • /
    • pp.243-243
    • /
    • 2022
  • Leaves are an important organism for photosynthesis and transpiration. The shape of leaf is crucial factor affecting plant architecture. V-shape leaf rolling is enhancing canopy photosynthesis by increasing the CO2 penetration and the light capture by reducing the shadow between the leaves. Therefore, moderate leaf rolling is thought to more high grain yield per area than flat leaf. We investigated 278 KRICE CORE accession's Adaxial Leaf Rolling Index (LRI) in first heading using the following equation. For each accession, genomic DNA was used for sequencing. We sequenced the genomics with ~8 X coverage to detect SNPS. Raw reads were aligned against the rice reference (IRGSP 1.0) for SNP identification and genotype calling. To generate genotype data for GWAS, SNPs were filtered with minor allele frequency 0.05. Finally, 841,134 high-quality SNPs were used for our GWAS. The significant threshold was -log10(P) >7.23. From the results, 2 significance SNP were detected. Considering the LD block of 250kbp, 60 candidate gene were selected including Hypothetical gene and Conserved gene. In this poster, we analyzed candidate gene affecting adaxial Leaf Rolling through single-trait GWAS.

  • PDF