Korean cattle exhibit three different color phenotypes: brown, brindle, and black (1). Brindle and black Hanwoo are the most closely related within breeds and among Hanwoo populations based on microsatellite markers analysis. However, brindle and black Hanwoo still have particular genetic characteristics that were highly significantly different (P < 0.001) based on FST pairwise test and principal component analysis (PCA) (2). In particular, brown Hanwoo has become highly specialized for meat traits after undergoing strong artificial selection (3). Since the Hanwoo breeding program was started in the 1930s, brown Hanwoo has been intensively selected based on carcass traits such as carcass weight, eye muscle area, and marbling (intramuscular fat) through progeny tests. Hanwoo progeny tests facilitated annual genetic gains in carcass traits that increased dramatically in Hanwoo populations (4), and may have affected genomic regions associated with these carcass traits in brown Hanwoo. Understanding the genetic mechanisms that lead to phenotypic changes requires identification of the genomic regions that have been under long-term artificial selection.
Strong artificial selection increases the frequency of favorable alleles at the loci that affect meat quality traits in meat production breeds. In this process, a small region of the genome surrounding the mutations is also selected, resulting in a small genomic region that shows reduced variation. This region of reduced variation is referred to as a selection signature, which is identified by nucleotide distribution around favorable mutations that statistically differ from those expected purely by chance (5). By identifying these signatures of past selection and then identifying the functional genes and mutations involved, it is possible to identify the major genetic and metabolic pathways that control important agricultural characteristics of modern breeds. Searches for selection signatures have successfully revealed many genes that have been important in cattle selection. Qanbari et al. performed a whole genome scan for selection signatures in the Holstein genome and then detected a set of 10 candidate regions, including diacylglycerol O-acyltransferase 1, casein cluster, growth hormone receptor, somatostatin, and leptin receptor (6), based on extended haplotype homozygosities (EHHs). More recently, based on EHHs, Schwarzenbacher et al. (2012) demonstrated a combination of association analyses and selection signatures in dairy cattle (7); Gautier et al. (2011) looked for footprints of selection within a creole cattle breed (8); MacEachern et al. reported the results of a comparison of allelic frequencies between Australian Angus and Holstein cattle (9); Flori et al. (2009) (10) and Hayes et al. (2009) (11) also reported that the growth hormone recepeor (GHR) gene on Bos taurus autosome (BTA) 20 has selection evidence by comparing Holstein breeds, Angus and Holstein; and Gautier et al. (2009) performed a whole genome Bayesian scan in West African cattle (12). The aim of this study is to compare EHHs of brown and brindle Hanwoo. The candidate selection regions might have been shaped by intensive artificial selection in the brown Hanwoo, and therefore could contain the genes that affect the traits emphasized in the Korean breeding program.
RESULTS AND DISCUSSION
Population structure and diversity
PCA separated and grouped individuals based on their origins (Supplementary data 1). The 10 breeds (Brahman, five European, and four Northeast Asian breeds) formed tight clusters, which is evidence of high relatedness based on the genetic background of individuals that belong to the same breed. The distribution of animals in the first and second principal component (PC1 × PC2) was similar to the distribution observed in previous studies (Porto-Neto et al. 2013). The Hanwoo cattle clustered tightly with the Northeast Asian samples (e.g., Jeju black, Brindle, and Yanbian), whereas the Yanbian cattle (from the Chinese Yanbian region) clustered closer to the European samples by approximately one-third of the distance along the PC2 vector.
Selection signal identification in brown Hanwoo
The EHH approach is useful when we consider brindle Hanwoo as an ancestral breed to brown Hanwoo: recent selection in brown Hanwoo yielded divergence from brindle Hanwoo, which should be reflected by genetic analysis. Fig. 1 shows the iES plot for positively selected regions identified by our genome-wide scan. From a total of 37,770 single nucleotide polymorphisms (SNPs), we removed 74 SNPs as outliers and then searched for selection signals in brown Hanwoo. A total of 306 SNPs were identified as having significantly different EHH values between the two populations. Of these, we deduced 17 core regions (more than three SNPs per region) in both directions up to 1.5 Mb from the most significant SNP and annotated a subset of genes in the core region. Table 1 shows the core regions among candidate regions with strong signals in brown Hanwoo. We also found the overlapping region in the Animal quantitative trait loci (QTL) database (http://www.animalgenome.org/cgi-bin/QTLdb/BT/index) (Supplementary data 2). The Animal QTL database revealed the presence of QTLs for meat traits (longissimus muscle area) and production traits (pre-weaning average daily gain) in cattle that overlapped with our candidate regions on chromosomes 5 (118.1 to 119.1 Mb) and 12 (36.1 to 37.1Mb).
Fig. 1.Genome-wide map of EHHi = iESi,Brown/iESi,Brindle. The x-axis indicates chromosome position (Mb). The y-axis is the standardized transformed z value. We described the genes in the candidate regions that have been reported for meat traits.
Table 1.The Genes column list annotated genes within the corresponding regions. Candidate genes in each region are indicated by bold type.
Resequencing of candidate regions
To further investigate the observed selection signatures, we selected four putative candidate regions among the selection signatures for brown Hanwoo from the results of iES analysis using the bovine SNP chip. First, we selected the core region (BTA12: 36.1-37 M) that was the most significant SNP in the candidate regions. We also selected the core regions (BTA5: 118.1-119.1 Mb and 86.3-87.8 Mb) that included the candidate genes (adenylosuccinatelyase, ADSL; and fatty acyl-CoA reductase 2, FAR2) for meat quality traits, as previously reported (13,14). Finally, the candidate region (BTA14: 34.1-35.6 Mb) was selected, although it only had one significant SNP. However, BTA14 is well known to play a critical role that affects meat traits and thus will provide bare information from next-generation sequencing (NGS) data (15,16). The detailed gene list is provided in Supplementary data 3.
Sequence data analysis showed a total of 250,800,964 sequence reads across the bovine genome. Of these, 196,378,008 reads (78.3%) were mapped to the bovine genome. The target enrichment was designed for selection signals totaling 4.86 Mb, and 55% of mapped reads were mapped to the target region for enrichment efficiency. Supplementary data 4 shows the summary statistics of sequence generation and mapping to the reference genome. The average depth over all samples was approximately 120×. On average, ~50% of all targets covered a depth of greater than 120×, except for one sample (brown 4). We also measured mean homozygosity (mean frequency of each SNP) in the brown and brindle Hanwoo genomes (Table 2). The mean homozygosity of the brown Hanwoo genome was slightly higher than that of the brindle Hanwoo genome (0.99 versus 0.96). Many SNPs might have been fixed in the current brown Hanwoo genome during its history. In contrast, many genomic regions might still be under genetic drift for brindle Hanwoo. This supports the conclusion that brown Hanwoo have had many genomic regions selected during their history.
Table 2.Overall SNPs (0.2 ≤ frequency) were identified using vcftoolsand were classified into homo SNPs (0.9 ≤ frequency) and floating SNPs (0.2 ≤ frequency ≤ 0.9). *indicates significant difference of mean homozygosity between brown and brindle Hanwoo (Welch t-test, P value = 2.8089E-92). The homozygosity is a mean value of allele frequency of homo SNPs
Candidate gene mutation effects
We performed SNP annotation based on the functional type in the target region. We identified a total of 10,314 SNPs in the candidate regions. Among 32 genes, functional impact of 29 nonsynonymous SNPs (nsSNPs) were analyzed using PolyPhen (Supplementary data 5), which predicts if the SNP affects the phenotype of the protein encoded by the gene carrying the SNP. PolyPhen analysis revealed 10 damaging nsSNPs and 19 benign nsSNPs. Interestingly, three nsSNPs (Lys88Met, Leu189His, and Arg302Gln) in the ADSL gene have the potential to have produced alterations in protein structure or function. Recently, ADSL was determined to be significantly expressed in longissimus muscle of high-fat groups of pigs that were 100 Kg in body weight (13). Additionally, an ADSL polymorphism is associated with meat quality in chickens because it affects inosine 5'-monophosphate contents (14). Therefore, we focused on the effect of three ADSL nsSNPs and conducted a study to predict conformational changes of single residue substitutions and their impact on enzyme stability and ligand-binding affinity in the bovine ADSL/reaction products interaction system, which consists of only monomers, from the wild-type complex model. Separate subunit of the whole interaction system as a tetramer might found not to be equally effective of residue mutations as result in Table 3. Three ADSL variants showed differences between the folding free energy of mutated structures and binding free energy with those molecular partners in the mutated complex structures. The substitutions were not directly changed in binding affinity by the two ADSL products. However, they transmuted enzyme stability, which might affect the formation of the stabilizing tetramer directly related with its activity. Interestingly, the destabilizing effect of two mutations (L189H, R302Q) displayed relatively smaller structural changes in the monomeric than dimeric enzyme, which is conjugated with the stability of the active cleft by immediately adjacent variants. If the destabilizing extent disproportionality varies from monomeric to dimeric enzymes, it may reflect differences in how its tetramer nature and three subunits that form each active site are affected by the mutations. Thus, it might produce a nonparallel reduction in the enzyme’s catalytic activity. The decreasing stability of bovine ADSL by L189H and R302Q mutations in the dimeric form could also be regulated by multiple effects between the subunits to remove any unfavorable contacts that occurred within their neighboring interfaces. We described the detailed information in Supplementary data 6.
Table 3.†The structure stability of themselves is evaluated as the differences between the folding free energy of mutated structures and the wild-type enzyme for the single-point mutations. ‡The binding free energy is defined as the difference between the free energy of the complex and unbound state. *Mutation effect is defined as follows that stabilizing is indicate mutation energy is less than -0.5 kcal/mol and Neutral is that mutation energy is between -0.5 to 0.5 kcal/mol.
Over the past 30 years, the body weight at 18 months of age increased from 331 to 574 kg. The annual genetic gain also increased from 0.02 to 0.82 kg/year. Consequently, it was assumed that Hanwoo cattle might have dramatically increased genetic improvement. These results indicate that, although brown Hanwoo have experienced recent selective pressure with short divergence time, selection signatures have been observed with a fitness advantage during domestication. In this study, we compared EHHs between brown and brindle Hanwoo populations. We identified four candidate regions that may have been under recent selection in brown Hanwoo. These regions contained candidate genes associated with meat traits. We subsequently resequenced the candidate regions and focused on the ADSL gene, which was previously reported to be associated with meat quality. Three nsSNPs of this gene that affected the protein structure were identified though sequencing and protein structure modeling.
MATERIALS AND METHODS
Animals and genotype assays
Hanwoo data were collected from 140 steers in two populations of brown and brindle Hanwoo: a) steers of candidate bulls for progeny testing in the Hanwoo Improvement Center of National Agricultural Cooperative Federation in Seosan (n = 120); and b) steers raised in Hankyoung Univ. and Gyeonbuk Province (n = 20). Genomic DNA for genotyping assays was extracted from a blood sample, and SNP genotyping was performed using the Illumina Bovine SNP 50K Bead chip (Illumina, San Diego, USA). The protocol was approved by the Committee on the Ethics of Animal Experiments of the National Institute of Animal Science.
SNP statistics analysis
Stringent filtering criteria were applied to the genotype data. Briefly, SNPs were excluded from the analysis if they were absent from over 5% of the genotypes, had median GC scores below 0.6, had GC scores under 0.6 in less than 90% of the samples, deviated in heterozygosity more than three standard deviations from the other SNPs, and were out of Hardy–Weinberg equilibrium (HWE) at a cutoff P value of 1-15. Unmapped SNPs and SNPs on sex chromosomes were also excluded. Individual genotypes with GC scores under 0.6 were treated as absent. After this - SNP filtering, genotypes were tested for HWE to identify possible typing errors using a chi-square test in R/SNPassoc Package (R Development Core Team). SNPs not in HWE (P < 0.05), monomorphic SNPs, and those with a minor allele frequency < 1% were removed from this study. Finally, from a total of 55,704 SNPs, genotype data were received for 39,404 SNPs, and all of those SNPs were physically mapped to a chromosome (in bp) using the bovine genome sequence (Btau4.2).
Hanwoo population structure analyses
To characterize population structure for Hanwoo cattle, fixation indices (FST) were estimated using the method described by Weir and Cockerham (1984) between East Asian (n = 5 breeds), European taurine breeds (n = 4 breeds), and an indicine breed (Brahman) with a reduced dataset of 20 randomly selected animals from each breed. Results were inferred based on a symmetrical distance matrix for the unrooted neighbor-joining tree estimation using R/APE package (Paradis et al, 2004). The genotype relationship matrix was estimated using VanRaden’s algorithm (VanRaden, 2007) to infer similarities within and between breeds for principal component analyses.
Extended Haplotype Homozygosity (EHH)
The counting algorithm of Tang et al (2007) was implemented to identify differential EHH regions within brown Hanwoo compared with brindle Hanwoo. To infer the proportion of homozygous individuals, EHHSi,j, at the ith and jth SNP were calculated using two steps. First, for each SNPi, EHHSi,j between SNPi and incrementally distant flanking SNPj were calculated until EHHSi, k < 0; this was performed for both j > i and j < I.
Second, the EHH of SNPi was calculated: iESi = (EHHSi,j) for i_j_k for the 3' region of I (or i_j_k for the 5' region of i).
Differential regions of EHH between brown and brindle Hanwoo were plotted based on the standardized of iESi ratio between the two populations and compared with the EHHS decay of a single site. We calculated a standardized integrated EHH SNP [iES(z)] value to identify significant regions of positive selection (P = 0.001, z = 3.5) (17). We also discarded top and bottom 0.1% as outliers for incomplete signals across genomes. The remaining SNPs were defined as test SNPs. Candidate regions with positive selection footprints were defined as containing at least three SNPs (P < 0.001) with a 1.5-Mb window and 0.75-Mb overlap. If there were several continuous windows with three SNPs or more, we combined all overlapping windows into one candidate region using the intersectBed algorithm in BedTools (http://bedtools.readthedocs.io/en/latest/).
Targeted resequencing of candidate regions
We generated NGS data from two Hanwoo breeds including brown (n = 10) and brindle (n = 10) Hanwoo. We randomly sheared 3 μg of genomic DNA using the Covaris System to generate ~150 bp inserts. The fragmented DNA was endrepaired using T4 DNA polymerase and Klenow polymerase, and Illumina paired-end adaptor oligonucleotides were ligated to the sticky ends. We analyzed the ligation mixture by electrophoresis on an agarose gel, and sliced and purified 200-250 bp fragments. The purified DNA library was hybridized with a SureSelect custom enrichment probes set (Agilent, Wilmington, USA) to capture a 4.86 Mb targeted region following the manufacturer’s instructions. We sequenced the Genome Analyzer IIx paired-end flowcell based on the manufacturer’s protocol using captured exome library.
Detection of variants and mutation effects of nsSNPs
The sequences were aligned to the bovine reference genomes (Btau 4.2) using the Burrows–Wheeler Aligner (version 0.6.1) (18) with default parameters. SAMtools (version 0.1.17) (19) was used to convert (SAM/BAM), sort, and index alignments. We used Picard tools to generate quality matrices for mapping. Duplicated reads were marked with Picard tools and excluded from downstream analysis. We performed local re-alignment and re-calibration using the Genome Analysis Toolkit (version 1.5.9) framework (20). We also used variant calling to perform variant calls and extract corresponding quality scores for each genotype. Raw SNP calls were filtered using empirically derived cut-offs for the following GATK filter expressions: -- clusterSize 3, - clusterWindowSize 10,-- maskNameIndel- filterExpression “MQ >= 4 && ((MQ0/(1.0*DP)) > 0.1)”--filterExpression “QUAL < 30”, -- filterExpression “QD < 5.0”-filterExpression “Hrun > 5”, --filterExpression “FS > 200.0”. We performed SNP annotation based on functional type using ANNOVAR (21) and filtered against known variants by comparison to the NCBI dbSNP database. nsSNPs were detected and the SNP/gene functional property analyzed using PolyPhen (http://genetics.bwh.harvard.edu/pph2). To explore mutation effects of candidate genes, we predicted the bovine protein structure by comparative homology modeling with the only of template structure (PDB code 2VD6) as human protein for which structure is adapted in and around its active sites of ligand binding with the way that subunits are assembled into dimers and tetramers. The homology modeling was performed using MODELER9 (version 10) within Discovery Studio (version 3.5) molecular modeling packages (http://accelrys.com/products/discovery-studio/; Accelrys Inc, San Diego, USA).