Evidence for positive selection of taurine genes within a QTL region on chromosome X associated with testicular size in Australian Brahman cattle

Background Previous genome-wide association studies have identified significant regions of the X chromosome associated with reproductive traits in two Bos indicus-influenced breeds: Brahman cattle and Tropical Composites. Two QTL regions on this chromosome were identified in both breeds as strongly associated with scrotal circumference measurements, a reproductive trait previously shown to be useful for selection of young bulls. Scrotal circumference is genetically correlated with early age at puberty in both male and female offspring. These QTL were located at positions 69–77 and 81–92 Mb respectively, large areas each to which a significant number of potential candidate genes were mapped. Results To further characterise these regions, a bioinformatic approach was undertaken to identify novel non-synonymous SNP within the QTL regions of interest in Brahman cattle. After SNP discovery, we used conventional molecular assay technologies to perform studies of two candidate genes in both breeds. Non-synonymous SNP mapped to Testis-expressed gene 11 (Tex11) were associated (P < 0.001) with scrotal circumference in both breeds, and associations with percentage of normal sperm cells were also observed (P < 0.05). Evidence for recent selection was found as Tex11 SNP form a haplotype segment of Bos taurus origin that is retained within Brahman and Tropical Composite cattle with greatest reproductive potential. Conclusions Association of non-synonymous SNP presented here are a first step to functional genetic studies. Bovine species may serve as a model for studying the role of Tex11 in male fertility, warranting further in-depth molecular characterisation.


Background
Indicine cattle (Bos indicus) are socially-and economicallyimportant breeds globally and dominate beef production systems in tropical and sub-tropical regions, and are an important resource in developing nations for food security in vulnerable communities [1]. Presence of indicine cattle is largely due to these breeds being better able to tolerate high temperature and humid environments common in these regions. Bos indicus also have improved tolerance to ticks and other tropical challenges compared to taurine breeds (Bos taurus) commonly found in temperate regions [2,3]. However, it is well documented that indicine breeds are inferior in terms of reproductive efficiency in comparison to taurine cattle [4][5][6]. For example, Brahman bulls typically reach puberty at a later age compared to taurine breeds, with downstream impacts on a number of economically-important indicators of herd productivity and profitability including generation interval and calving rates [7][8][9]. Tropically-adapted composite herds have also been developed with the aim of retaining the beneficial adaptations of indicine breeds for environmental stressors, while integrating taurine genes for production traits including reproductive performance [10][11][12]. Improving reproductive performance of indicine cattle will provide positive impacts in terms of both food security and environmental impacts through increased efficiency in livestock production in areas where hunger is an issue and demand for animal protein is increasing.
Fertility of individual sires is an important production and economic trait affecting reproductive performance of the whole herd. However, fertility is not an easily defined trait, comprising a variety of important heritable traits for selection of early maturing animals [13]. Whereas estimates of heritability for female fertility traits are low, heritability estimates for testicular traits are moderate to high and therefore can be used for effective selection [13][14][15]. For example, scrotal circumference (SC) is an easy to measure and consistent indicator trait for puberty in bulls with heritability across breeds ranging from medium to high [16]. Also, SC is favourably correlated to other traits affecting both male and female reproductive performance, including sperm quality, percentage of normal sperm, semen concentration, body weight, age at first pregnancy, progesterone levels and age at puberty in heifers [15][16][17][18][19][20][21][22][23][24]. Genetic selection that uses SC measurements to improve male fertility can impact positively on whole herd reproductive performance. Likewise, percentage of normal sperm (PNS) impacts directly on bull fertility, as it is a meaningful predictor of calf output per bull [25]. Therefore, measurements of SC and PNS are used as indicators in breeding programs for selecting fertile bulls [18,20].
A limitation in the use of SC and PNS as selection tools is that these reproductive traits cannot be measured before bulls reach 12-24 months of age. Time constraint can limit the attractiveness of these traits as selection tools. Identifying genetic markers associated with these traits would potentially overcome this limitation as they can be measured in DNA samples extracted from blood at birth.
Recent genome-wide association studies (GWAS) in Brahman and Tropical Composite cattle has resulted in the identification of a number of genomic regions associated with SC, PNS and other reproductive traits [26][27][28][29][30]. Fortes et al. [29] reported a 30 Mb region from approximately 62 to 92 Mb positions on chromosome X associated with SC in Brahmans. This region was also associated with the age at which a bull achieves a scrotal circumference of 26 cm (AGE26), regarded as puberty [27]. This finding was confirmed in a subsequent study on a Tropical Composite population [30], with significant polymorphisms grouped in two discrete regions at 69-77 Mb and 81-92 Mb positions on the X chromosome.
These confirmed QTL regions warrant further investigation and in the current study, two candidate genes, androgen receptor (AR) and testis expressed 11 (Tex11), were investigated for their association with SC and PNS measurements in Brahman and Tropical Composite populations. Androgen Receptor was chosen for its proximity to the QTL peak and its well-studied roles during embryogenesis and puberty for male phenotype development and sexual maturation [31]. Androgens, principally 5α-dihydrotestosterone (DHT) and testosterone, are essential for the maintenance of behaviour and function of male reproduction and exert their effects by interacting with AR. Both native ligands of this receptor, testosterone and DHT bind to AR to activate target gene expression at the transcriptional level. Androgen-AR activity results in the promotion of maintenance and development of male reproduction and male phenotype [31]. The Tex11 gene is also in the same QTL region in chromosome X [29,30], and has previously been reported to be essential for male meiosis and fertility in humans and pigs [32,33].
The aim of this study was to characterise the variations in these genes including 5′UTR, test their association with male reproductive traits and investigate the origin of favourable haplotypes. Precedence for the importance of polymorphisms in the 5′UTR gene region include SNP within this region of the bovine growth hormone receptor gene, which has a marked effect on beef production traits [34], and the association of a SNP within the 5′UTR of the bovine lactoferrin (LF) gene with reproductive parameters and uterine infection in dairy cattle [35]. Identification of novel single nucleotide polymorphisms (SNP) markers was possible through a combination of bioinformatic analyses of genome sequence data of representative Brahmans and target sequence analysis within the study population. Discovered markers were used to test these candidate genes for their association with observed phenotypic variation of SC and PNS across both Brahman and Tropical Composite populations. Haplotypes across the X chromosome were determined to be of Bos taurus or Bos indicus origin. Haplotypes from across the entire X chromosome were contrasted with Tex11 haplotypes to verify the origin of haplotypes and search for evidence of recent selection.

SNP discovery using Brahman genomes
Genome sequences from 16 Brahman bulls, sires of the studied population, were initially compared to the published Bos taurus UMD3.1 Bovine Genome Assembly on chromosome X between positions 60,168,532 and 93,999,806 bp. This region has previously been shown to contain 2 QTL associated with SC, PNS and other reproductive traits in both Brahman and Tropical Composite bulls [26][27][28][29][30]. Using Samtools [36], 139,409 SNP were identified, including 247 non-synonymous SNP (nsSNP) affecting 95 genes of which 192 SNP (78%) have not been previously reported in the dbSNP short genetic variations database (NCBI). All new SNP have been entered into dbSNP.
Bioinformatic analysis initially identified 555 SNP across the~210 kb region of the AR genomic region equaling an average of 1 SNP per 380 bp. No SNP were identified within the coding region of the gene, although many SNP or short indels are present surrounding the intron-exon boundaries. For Tex11 530 SNP were identified across~187 kb equaling an average of 1 SNP per 352 bp. Three previously unreported nsSNP were identified within exons 1, 10 and 23. It has previously been estimated that 20-30% of nsSNP affect protein function [37,38]. The virtues of using these SNP as targets for the current study included the ease of assay development and potential functionality of SNP. Additionally, these SNP were located across a large genomic area (135.7 kb) spanning multiple exons of the candidate gene.

AR SNP and indel validation by direct sequencing of animals with extreme SC
Initial screening of selected SNP by PCR and sequencing within 5′UTR, intron-exon boundary regions and 3′ UTR of the AR gene was performed using small panels of phenotypically extreme SC animals from within the Brahman bull population. The sequencing confirmed that 13 markers (12 SNP and 1 indel) previously identified bioinformatically were represented within the Brahman population. Furthermore, the sequencing exercise identified 3 markers (2 SNP and 1 indel) that were not observed in bioinformatic analyses (Table 1). Of particular interest were a cluster of 4 polymorphisms (2 SNP and 2 indel) within the 5′UTR region (X:88,621,605-88,621,617 bp). These markers formed three haplotypes subsequently designated as 5AR ::CC, 5AR TTAA and 5AR T-TAALONG ( Figure 1).

AR 5′UTR association study
Due to the difficulty in developing a high-throughput assay for this region, only a small number (n = 259) of Brahman bulls were characterised by direct sequencing with haplotype frequencies of 5AR ::CC, 5AR TTAA and 5AR TTAALONG being 59.1%, 36.3% and 4.6% respectively. Despite the relatively small sample size limiting the power of the subsequent association analysis, statistical analysis based upon the G > T SNP at position 88,621,617 across the 259 animals reveals that the SNP was associated with SC12, SC18 and SC24 with P values of 0.011, 0.0128 and 0.0144, respectively. This G > T SNP served as a tag SNP to the haplotypes due to perfect LD between these SNP.
A possible mechanism for the effect of the AR 5′UTR polymorphism may be changes to transcription factor binding sites across this region, as demonstrated through in silico studies (Table 2). Of particular interest is the presence of 4 putative SRY binding sites in 5AR TTAA compared to only 2 comparable sites in each of 5AR ::CC and 5AR TTAALONG . SRY is a testis-expressed protein encoded In naming of primers, "For" indicated a forward primer and "Rev" indicated a reverse primer.
by the sex-determining region on the Y chromosome that mediates male sex determination [39,40]. A previous study has shown that human SRY interacts with and negatively regulates AR transcriptional activity [41]. Our data suggests that those bulls with the 5AR TTAA haplotype in the 5′UTR region of AR have a smaller SC at 12, 18 and 24 months. One hypothesis raised from this result is that the increased affinity for SRY in these animals leads to reduced transcription of the AR gene with downstream impacts upon rate of scrotal development and puberty.
Targeted AR and Tex11 SNP genotyping and association analysis by TaqMan® assay Using TaqMan® assays, 980 Brahman and 619 Tropical Composite bulls were genotyped for 4 SNP including 3 SNP in Tex11 (Tex11_ r38k, Tex11_g297d, Tex11_r696h) and a SNP in intron 4 of the AR gene (AR_In4) (Additional file 1: Table S1). As expected during allelic discrimination analyses, heterozygosity was not observed as these markers were present on the X chromosome of male samples. Statistical analyses demonstrated that three SNP (Tex11_ r38k, Tex11_r696h and AR_In4) were strongly associated with SC12, SC18 and SC24 in Brahmans with P values in the range 10 -7 -10 -14 (Table 3). These associations were subsequently confirmed in the Tropical Composite bulls ( Table 4). The SNP Tex11_g297d was also associated with PNS18 and PNS24 in Brahmans (Table 3), while all markers except for Tex11_g297d were associated with PNS18 and PNS24 in the Tropical Composite bulls (

Discussion
Several studies have been published on the role of polymorphisms in other candidate genes associated to pubertal traits in cattle. For example, polymorphisms in insulin like growth factor 1 (IGF1), gonadotropinreleasing hormone receptor (GNRHR) and testicular luteinizing hormone receptor (LHR) were associated with age at puberty in Angus male cattle, in Polish Holstein-Friesian cattle [42] and in Female Brahman cattle [43]. This current analysis is the first to implicate Tex11 and AR as regulators of puberty onset in bovine bulls.
Previous studies have estimated that Australian Brahman cattle may contain up to 10% Bos taurus genes and that

Conclusions
In this study, two putative candidate genes (AR and Tex11) were investigated for their association with SC and PNS measurements, recognised fertility traits in Brahman and Tropical Composite bulls. These genes were selected for this study because of their position, which mapped to reported QTL [29,30], and because of their known functions. Our results provide strong evidence for the biological role played by AR and in particular Tex11 in male reproduction in cattle. These two genes emerge as strong candidate genes for explaining SC and PNS variation in tropically adapted cattle breeds such as Brahman and Tropical Composite, with varying degrees of taurine and indicine content. Evidence for  recent positive selection of a favourable Bos taurus haplotype was shown and potential causative mutations were described, such as the nsSNP in Tex11 and the 5'UTR SNP in AR that might affect SRY transcription factor binding sites. Knowledge about causative mutations merits confirmation and functional studies. These causative mutations can be used to improve animal breeding and selection as well as being potential models for male fertility issues in mammals.

Methods
The flow of procedures that formed this study is summarized in the illustration provided ( Figure 3) and detailed below.

Animals
Animal Care and Use Committee approval was not required for this study because the data were obtained from existing phenotypic databases and DNA storage banks as described below. Data from 1,118 Brahman bulls born between 2004 and 2008 were available for the current study. These bulls were the progeny of 55 sires from a larger Cooperative Research Centre for Beef Genetic Technologies (Beef CRC) breeding experiment to investigate genetics associated with whole herd profitability [12,[45][46][47][48]. Data from 1,085 Tropical Composite bulls born between 2007 and 2009, developed with combinations of Belmont Red, Charbray, Santa Gertrudis and Senepol breeds that represent a genotype with 50% tropically adapted and 50% unadapted genetics were available to use in the current study [47]. These bulls were the progeny of 56 sires also from the Beef CRC. Experimental design, trait measurement details and quantitative genetic analyses of these Brahman and Tropical Composite populations have been described previously [12,47].

Phenotypic data
Traits utilized in this study are components of male fertility: scrotal circumference (SC) at 12 (SC12), 18 (SC18) and 24 (SC24) months and percentage of normal sperm (PSN) at 18 (PNS18) and 24 (PNS24) months. These traits were measured at the bull breeding soundness evaluations (BBSE). Scrotal circumference was measured in centimetres using a standard metal tape [49]. As part of BBSE, bulls having a SC of 20 cm or more were subjected to electro-ejaculation. The percentage of normal sperm (PNS) was based on sperm morphology assessment which was carried out by a laboratory technician accredited by the Australian Cattle Veterinarians [29,30,49].

Bioinformatic analyses of genome sequences
As part of the larger Beef CRC project, the genomes of sixteen Brahman bulls were sequenced at CSIRO Animal, Food and Health Sciences in St Lucia (unpublished data). These bulls were from the cohort of 55 sires used to generate the population used in GWAS analyses [26][27][28]30,42]. In brief, DNA from all bulls were size selected to the range 400-600 bp, DNA libraries for sequencing by reversible terminator chemistry were prepared for each animal separately using the standard Illumina kit, and 100 bp paired-end reads were collected using an Illumina HiScan sequencer. Raw data reads were analysed using the CASAVA 1.8 software package (www.illumina.com/software/genome_analyzer_software. ilmn), where reads were matched to the UMD3.1 Bovine Genome Assembly. A single BAM format file was created for each sire.
Using Samtools [36], raw genomic data covering the 60-94 Mb region of chromosome X were extracted for each of the 16 animals as individual bam files, and all were aligned to the Bos taurus UMD3.1 Bovine Genome Assembly. Variant calls were outputted in VCF format. Variant Effect Predictor (VEP) was used to predict the functional consequences of detected variants [50]. In silico prediction of DNA transcription factor binding sites was performed using TFSEARCH [51].

Androgen receptor gene characterisation
The AR gene is highly conserved across populations, with no nsSNP detected by bioinformatic analyses. Classic PCR and sequencing methodologies across exon-intron boundaries were used to confirm novel non-coding SNP that may be experimentally valuable for further characterisation of this region in studied populations. For sequencing of the AR gene, DNA samples were selected from 17 bulls representing the extreme phenotypes for SC. Sequencing was performed for 10 animals with the highest SC measurements, high SC, and 7 animals with the lowest SC measurements, low SC. Primer design used the Bos taurus reference gene sequence (ENSBTAG00000022255). Primer pairs were designed to amplify a number of intronic regions immediately up-or downstream of known exons, with a full list of primer pairs in Additional file 3: Table S3.
PCR amplification was performed in an Eppendorf Master Cycler Gradient EP-S thermal cycler. PCR reactions (25 μl) contained 50-100 ng DNA, 1x GoTaq® PCR master mix (Promega, Australia) and 0.5 μM of each primer. The reaction profile used was 94 0 C for 2 mins, 35 cycles of 94°C for 10 s, primer-dependent annealing temperature (Additional file 3: Table S3) for 20s and 72°C for 2 mins, followed by a final extension at 72°C for 10 mins. PCR products were separated on 1% agarose gels with 1xTAE (Tris-acetate-EDTA) and visualized on UV transilluminator. PCR products with the expected size range (Additional file 3: Table S3) were sequenced with the forward and reverse primers using BIG DYE 3.1 terminator mix on an ABI 3130xl Genetic Analyser (Applied Biosystems, CA, USA). DNA sequence data were analyzed using Sequencher™ 4.1 (GeneCodes, MI, USA).
For a number of PCR products, deletions predicated the need for cloning of products prior to sequencing. When necessary, PCR products from individual animals were excised from the gel and purified using the QIAquick gel extraction kit (QIAGEN, Doncaster, VIC, Australia). PCR products were then cloned into pGem-Teasy vector (Promega, Australia). Putative positive clones were sequenced with M13 forward and reverse primers.

Genotyping of selected SNP
Custom TaqMan® assays were developed for a novel non-coding SNP in the AR gene (AR_In4) located in the Figure 3 Flow of research procedures from genome-wide association discovery of associated genomic regions to test of novel single nucleotide polymorphisms located in candidate genes.
fourth intron, as well as 3 novel nsSNP spanning the Tex11 gene (Tex11_ r38k, Tex11_g297d, Tex11_ r696h). Primers and probes were developed using the Custom TaqMan® Array Design Tool, and are listed in Additional  file 1 Table S1.
Genotyping was performed by allelic discrimination using custom TaqMan® SNP genotyping assays, following the manufacturer's instructions. Briefly, 5 μl PCR reactions were carried out containing 2.5 μl of TaqMan® Universal PCR Master Mix (Applied Biosystems, New Jersey, USA), 10 ng of DNA template and 0.25 μl of TaqMan® assay primers and FAM/VIC labelled probes by Applied Biosystems as Assays-by-Design™ (Applied Biosystems, Foster City, CA, USA). All thermal cycling experiments were performed in 384 well plates on a Gene Amp 9700 (Applied Biosystems). Amplification conditions consisted of 50°C for 2 min, 95°C for 10 min followed by 40 cycles of 95°C for 30 s and 60°C for 1 min, and finally 25°C until removed from the thermal cycler. End-point reads were then performed on the Applied Biosystems ViiA™ 7 Real-Time PCR System, and allelic discrimination analysis was performed using ViiA™ 7RUO software (Life Technologies, CA, USA).

Statistical analyses
The association of each SNP with SC12, SC18, SC24, PNS18 and PNS24 was examined for genotyped Brahman bulls using a mixed model analysis of variance with ASREML software [52]. The analysis performed was similar to those performed in previous GWAS and quantitative genetic analyses [12,29,30]. Briefly, the mixed model can be written as follows: Where y i represents the phenotypic measurement for the ith animal, X is the incidence matrix relating fixed effects in β with observations in y, Z is the incidence matrix relating to random additive polygenic effects of animal in μ with observations in y and Sj is the observed animal genotype for the j th SNP (coded as 0, 1 or 2 to represent the number of copies of the B allele), α j is the estimated SNP effect, lastly e i is the random residual effect. The same fixed effects were used for each trait. These fixed effects included contemporary group (animals born in the same year and raised together) and a second term that was the interaction of year and month of birth.
The percentage of the genetic variance accounted by the j -th SNP was estimated according to the formula %V j ¼ 100⋅ 2p j q j a 2 j σ 2 g where pi and qi are the allele frequencies for the j -th SNP estimated across the entire population, a j is the estimated additive effect of the j -th SNP on the trait under analysis, and σ g 2 is the REML estimate of the (poly-) genetic variance for the trait.
The origin of haplotypes on the X chromosome was determined by using the method described by Bolormaa et al. [44]. This method classifies haplotypes as either Bos taurus or Bos indicus in origin depending on their relative frequency in a reference population. In brief, haplotypes of 10 consecutive SNP were built and their frequencies calculated in 1,105 Brahman bulls (reference population for Bos indicus) and 3,666 Angus, Hereford, Murray Grey and Shorthorn bulls (reference population for Bos taurus) of the Beef CRC. The probability that each segment was of Bos taurus origin (b i ) was calculated using the following formula: Where p Bti is the frequency of haplotype segment i in Bos taurus animals and p Bii is the frequency of haplotype segment i in Bos indicus (Brahman) animals. Segments were classified as Bos taurus if b i > 0.6 and Bos indicus if b i < 0.4. Haplotype segments with b i between 0.4 and 0.6 were classified as undetermined origin.

Genomic examination of frequency of taurine and indicine specific alleles
To determine whether there was a general increase in the proportion of Bos taurus alleles across the genome or X chromosome in animals carrying the G allele of the Tex11_r38k SNP, the percentage of Bos taurus in each animal was estimated using either all 50 k autosomal SNP or all X chromosome SNP from the 50 k chip, using Admixture [53]. The data set used to estimate taurine content was 3,666 Bos taurus animals (Angus, Shorthorn Murray Grey and Hereford) and 1,032 Brahman bulls, all extracted from The Beef CRC database, with 50 k genotyping detailed before [29,30,54]. The mean and standard deviation of Bos taurus proportion was then calculated and used to compare animals with the A or G alleles of Tex11_r38k.

Availability of supporting data
New SNP were submitted to NCBI and the list of SS reference numbers is provided in the additional material section (Additional file 4). Sequence data was submitted to the 1000 bull genomes project, which is a collection of sequence data, intended as a resource for the bovine research community (http://www.1000bullgenomes.com/).