Genome-wide association study for somatic cell score in Valdostana Red Pied cattle breed using pooled DNA

Background Mastitis is a major disease of dairy cattle occurring in response to environmental exposure to infective agents with a great economic impact on dairy industry. Somatic cell count (SCC) and its log transformation in somatic cell score (SCS) are traits that have been used as indirect measures of resistance to mastitis for decades in selective breeding. A selective DNA pooling (SDP) approach was applied to identify Quantitative Trait Loci (QTL) for SCS in Valdostana Red Pied cattle using the Illumina Bovine HD BeadChip. Results A total of 171 SNPs reached the genome-wide significance for association with SCS. Fifty-two SNPs were annotated within genes, some of those involved in the immune response to mastitis. On BTAs 1, 2, 3, 4, 9, 13, 15, 17, 21 and 22 the largest number of markers in association to the trait was found. These regions identified novel genomic regions related to mastitis (1-Mb SNP windows) and confirmed those already mapped. The largest number of significant SNPs exceeding the threshold for genome-wide significant signal was found on BTA 15, located at 50.43-51.63 Mb. Conclusions The genomic regions identified in this study contribute to a better understanding of the genetic control of the mastitis immune response in cattle and may allow the inclusion of more detailed QTL information in selection programs. Electronic supplementary material The online version of this article (doi:10.1186/s12863-014-0106-7) contains supplementary material, which is available to authorized users.


Background
Mastitis is one of the most frequent inflammatory disease with a significant economic implication for the dairy herds and the resistance to this pathology may be improved by breeding.
The development of mastitis is the result of the interaction among three components: the individual genotype, the pathogens (ordinarily classified in contagious and environmental bacteria) and the environment (hygiene, housing, climate, milking machines, feeding) [1].
The resistance to an infection disease or the absence of susceptibility may be defined as the immune response ability (immuno-competence capability) of an animal, to avoid the pathogens replication after the establishment of an infection. This implies that animals tend to vary in their genetic potential for immuno-competence [2]. The genetic resistance or the genetic susceptibility to mastitis involves interlinked biological mechanisms that activate and regulate the different levels of the immune response, as a consequence of the differences existing in the response to mastitis involving several pathogens [3]. A better understanding of the immune system and of the metabolic pathways involved in the response to various pathogens of resistant and susceptible animals may be used as complementary approach for the disease control.
The discovery of millions of SNP markers in animal genomes forming dense marker panels, and the concomitant decrease in genotyping costs have allowed the performing of genome-wide association studies (GWAS) [4]. The availability of SNP dense genotypes have increased the power of the identification of QTL related to the traits of interest [5], allowing more accurate breeding values estimation with the use of genomic selection methodology and helping the understanding of the genetic control of the traits of interest [6]. Because of the established knowledge of the positive genetic correlation between clinical mastitis and SCS ranging from 0.6 to 0.8 [1], SCC is one of the traits used as an indirect measure of mastitis resistance/susceptibility in breeding programs in cattle and sheep. Many GWAS have detected QTL for SCC in cattle on BTAs 5,6,8,11,17,18,20 and 23 in cosmopolite improved dairy cattle breeds [1,7].
The high costs of screening large populations for marker allele frequencies can be decreased using the SDP approach, genotyping pooled DNA samples from selected individuals at each of the two phenotypic extremes of the trait distribution [8]. Equal amounts of DNA are pooled from individuals in the extreme tails, and pools are then genotyped to estimate allele frequency differences for each SNP among high and low tail pools. The significant identified candidate SNPs are then used for confirmatory association studies [9].
The aim of this study was to identify QTL associated with SCS as an indicator of mastitis. We performed a GWA study for SCS in the Valdostana Red Pied cattle, with a selective DNA pooling analysis, using the Illumina BovineHD Bead chip.

Results and discussion
Among the 2,417 bulls with DP-EBV values, 275 had semen samples available in the Valdostana Red Pied bio-bank that encompassed in total 373 sires samples spanning across generations.
The Valdostana Red Pied population counting at present about 11,000 milking cows, did not undergo focused selection for milk production only and no gene introgression from other populations have ever occurred. The breed is strongly adapted to harsh alpine environment because breed natural adaptation and because has been selected to maintain pasture capability (summer pasture is the common farming system), longevity, functionality and fertility. Thus the population is somehow a unique genetic resource to map mastitis resistance, a trait related to adaptation, functionality and longevity. The study used all the sire samples available in the Valdostana Red Pied bio-bank thus highlighting the overall observable variability for productive and functional traits in this breed. The smaller number of sires available for the study respect to mapping, may limit the capacity to disclose QTL for mastitis resistance. Nevertheless the experimental design here used and the genetic makeup of the population allowed to identify several new QTL and confirm regions identified in the Italian and Swiss Brown population [10], another breed originating from alpine region, now strongly selected for milk production.
Descriptive statistics for the DP-EBVs and the size of the pools for each tail are reported in Table 1.
The initial dataset included 721,644 SNPs. After editing, the association analysis were performed with 655,665 SNPs for SCS DP-EBV. Figure 1 shows the Q-Q plot of SNPs at marker level (p-values). Deviations from the identity line showed the amount of false positive tests resulted from the analysis of the data. Figure 2 showed the Manhattan plot of genomewide associations for SCS trait.
A total of 171 significant SNPs in 24 chromosomes were identified above the Bonferroni genome-wide threshold of 0.05. The Additional file 1 showed the list of the 171 significant SNPs identified. The SNPs location and the gene annotation were reported for both the UMD3.1 and Btau4.6.1 assembly. Table in Additional file 1 included the indication of QTL, amongst the ones here disclosed, reported in the online AnimalQTLdb (http://www.animal genome.org/cgi-bin/QTLdb/index) for clinical mastitis, SCC and SCS.

Intragenic SNPs
Among the 171 significant markers, 52 SNPs were annotated within 36 genes (Table 2). In Table 2 the significant intragenic SNPs and their corresponding annotated genes in the Btau4.6.1 assembly are reported.
Also the BovineHD1500008366 (rs41754552) and the BovineHD1500008367 (rs110269361) SNPs were located THY1 is one of the genes differentially expressed between control quarters from cows infected with E. coli and S. aureus pathogens [11]. Also Moyes et al. 2010 [12] reported the THY1 upregulation in S. uberis intramammary infections.
Sugimoto and Sugimoto [13] provided evidence that the IGF1R is involved in innate immunity through autophagy (general term for the degradation of cytoplasmic components within lysosomes, [14]) in bovine. In Bos taurus, in fact, a polymorphism in the 5′UTR region of IGF1R (BTA 21) was associated to mastitis incidence, determining the inhibition of autophagy in response to S. Agalactiae invasion.
On BTA 19, at 55 Mb, SOCS3 (suppressor of cytokine signalling 3) was found at 673,863 bp upstream the Bovi-neHD1900015066 (rs132720248) SNP. This gene, important for the mammary tissue homeostasis, encodes an intracellular inhibitor of cytokine signaling, thus playing an important role in the initial steps of the recognition of pathogen-associated molecular pattern (PAMP) of the innate immune cells. This leads to the activation and initiation of the innate and the adaptive immune responses. Heeg and Dalpke [16] and Brenaut et al. [17] found the SOCS3 gene among the 39 differentially expressed genes in milk fat globules of goats in response to an experimental intramammary infection with S. aureus.   The gene encoding for the serine dehydratase (SDS) on BTA 17 was located 416,619 bp upstream of the BovineHD1700018352 (rs135157738) SNP. This gene is included in the glycine, serine and threonine metabolism, as reported by [18]. These authors demonstrated that the serine dehydratase is one of the enzymes that changed significantly in bovine affected to mastitis.
The genes above mentioned near to significant SNPs (ZNFX1, CTGF, TRIM21, CXCL2 and CXCL10) are significantly differentially expressed by the bovine mammary epithelial cells stimulated with E. coli crude lipopolysaccharide [19].
Jensen et al. 2013 [11] studied and compared the transcriptional responses of uninfected mammary gland quarters adjacent to quarters infected with E. coli and S. aureus in Holstein cows. The CXCL2 resulted to be one of the genes differentially expressed between control quarters infected with both the pathogens, while the CXCL10 resulted to be one of the genes differentially expressed in control quarters from animals infected with S. aureus for 24 and 72 hours.  The BovineHD2200003506 (rs110821186) SNP on BTA 22 mapped close to the MYD88 (myeloid differentiation primary-response gene 88) at 11.72 Mb, which plays a functional role in transducing pro-inflammatory molecule lipopolysaccharide (LPS) that are responsible for the majority of acute clinical cases of mastitis [20]. Table 3 reported a list of the chromosome regions defined by at least three SNPs that were strongly associated to SCS. The highest number of significant SNPs (14) exceeding the significant threshold for genome-wide significance signal was found on BTA 15 (located at 50.43-51.63 Mb). On the same BTA 15, also two smaller peaks consisting of three SNPs located at 28.39-28.99 and 5 SNPs located at 31.28-32.02 Mb were identified. These regions are located in QTL that were mapped, respectively, for clinical mastitis using a linkage analysis [21] and for SCS [22]. The region located at 50.43-51.63 Mb on BTA 15 has not been reported before in cattle breeds (http://www.animalgenome.org/cgi-bin/QTLdb/index), thus identifying a supposed candidate chromosome region associated to SCS. The chromosome region on BTA 9 (72.78-72.80 Mb) mapped in a QTL region previously identified for the general disease resistance (including clinical mastitis) and for SCS [23].

Chromosome regions associated to SCS and clinical mastitis
Lund et al. [21] found a QTL region associated to SCS located at 32.62-43.31 Mb on BTA 22. In our study, three significant SNPs were in this region.
Sahana et al. [24] in a study on the confirmation and fine-mapping of clinical mastitis and SCS QTL in Nordic Holstein cattle using BovineSNP50 BeadChip, found the highest number of significant associations on BTA 6 identifying a QTL region for clinical mastitis at 83.37-88.89 Mb (UMD3.1 assembly). This result was also confirmed in a recent study in German Holstein cattle [25]. In our study, two significant SNPs (BovineHD0600023179 (rs133319155) and BovineHD0600023185 (rs136907262)) were found respectively at 84.25 and 84.26 Mb on BTA 6 (UMD3.1 assembly; Btau4.6.1 assembly position was not available), being mapped within the QTL region described by the authors previously cited (see Additional file 1).

Annotation
Among the 36 genes listed in Table 2, the annotation data were available for 23 genes reported in the Additional file 2. This lists the biological processes (BP), the cellular components (CC), the molecular function (MF) and the metabolic pathways (KEGG) obtained with the annotation analyses performed with DAVID online Database.
The literature brings evidence that some of the genes reported in Table 2 map in QTL associated to traits of economic importance in bovine (http://www.animalgenome. org/cgi-bin/QTLdb/BT/index), as showed in Additional file 3. Those mapping in QTL already associated to clinical mastitis and SCS reported in the QTLdb were only 4: the PLXNA4 (plexin A4) on BTA 4, the THY1 (Thy-1 cell surface antigen) on BTA 15 and the SHISA9 (known as CKAMP44, shisa homolog 9) on BTA 25, the FAM19A1 (family with sequence similarity 19 (chemokine (C-C motif)-like), member A1) on BTA 22 associated with SCS. This study thus highlighted possible QTL related to mastitis resistance in the other 19 genes annotated and considered in the GO analysis.

Conclusions
This is the first mapping for SCS in Valdostana Red Pied population, an autochthonous alpine dual purpose cattle breed whose selection is mainly focused on milk quality, meat production and functionality.
This study brings evidence of significant associations between SCS and SNP markers on several chromosomes in known and newly disclosed QTL regions. Some genes involved in mastitis resistance or variation of SCS content were in QTL on BTAs 9,13,15,17,19,21,22. In particular, the strongest associations were highlighted on BTA 15 with a total of 24 significant SNPs distributed in three regions.
The detection of genomic regions will help to understand which potential candidate genes may be responsible for the genetic variation in mastitis resistance/susceptibility, a trait of primary importance in dairy cattle breeding and farming.

Sampling
The Valdostana Red Pied cattle is the most common autochthonous dual purpose breed in the region Val d' Aosta (13, The daily SCC were transformed into SCS [26]. Genetic parameters and estimated breeding values (EBVs) were calculated with a test day repeatability model on first parity cows. The model of analysis considered the fixed effects of days in milk (10 classes of 30 days each), herd-test day effect (32,870 levels), month of calving and age at calving (12 classes). Additive genetic and permanent environmental effects were considered as random. Three generations of ancestors were used for each individual extracting information from the National Herd Book for a total of 35,803 animals. Variance component estimations were calculated based on 258,680 test day records with the software VCE [27] and individual EBVs were obtained with the package BLUPF90 [28]. Deregressed proofs (DP-EBV) were calculated for 2,417 bulls according to [29].

Pool constitution
The bull families structure was verified in terms of number of sons per bull, in order to avoid overrepresentation of a single sire. Only 1 bull had 6 sons, 4 bulls had 5 sons, 3 bulls had 4 sons and the rest of bulls had 3 or less sons. The sires were ranked according to DP-EBVs for SCS: the top 20% and bottom 20% sires were identified for the constitution of independent pools within tail of the DP-EBV distribution. In order to obtain two independent groups of different animals within tail with comparable phenotypic value, the selected samples for each tail were clustered (even and odds numbers) into 2 sub-pools.
A total of 79 samples were selected for the pools constitution as follows: 2 independent pools of 20 individuals each in the high tail and 2 independent pools of 20 and 19 individuals each in the low tail. Furthermore, for each pool, 2 DNA duplicate-pools were independently constructed from identical samples. Thus, a total of 4 pools per tail were produced.

DNA extraction and genotyping
Bulls DNA was extracted from semen samples using the ZR Genomic DNA™ Tissue MiniPrep (Zymo). The quality control was performed on each sample to verify the DNA integrity on Invitrogen E-Gel 1% Agarose Gel. The GloMax®-Multi Detection System instrument using the Quant-iT™ dsDNA Broad-Range (BR) Assay Kit (Life Technologies), determined the initial DNA concentrations. The DNA concentration for a single sample was evaluated three times and each read was verified twice (e.g. 2 instrument runs). Samples having concentration diverging ±1 SD from the mean value were not included in the pools. Samples of DNA were normalized to a concentration of 10 ng/ul, which was reconfirmed with the same methods above described. DNA pools were constructed by taking equivalent amounts of DNA from each sample.
The final pools were concentrated to 50 ng/ul, as required for the Illumina array protocol. Each sub-pool was genotyped 3 times on different chips (array replicates). In all, 24 different chip positions on 3 microarrays were used for the pooled genotyping. Genotyping was performed using the Illumina BovineHD BeadChip (777,962 SNPs) according to the Infinium protocol. SNPs positions were accordingly to the UMB 3.1 bovine assembly.

Statistical analysis of pools
Pools were analysed according to the SDP approach. The B-allele frequencies being a good estimator of the allele frequency of the individuals in a pool for each array replicate [30], were used in the analyses after obtaining them from the self-normalization algorithm of Illumina BeadStudio software®.

The multiple marker test
A pipeline in R software (http://www.r-project.org/) was adapted from [31] and [32] to perform a multiple marker test. The test statistic used for each SNP was: where Dtest is the difference of the B-allele frequencies means among tails; Dnull is the difference of the Ballele frequencies means within tails. The test statistic was distributed as χ 2 with one degree of freedom under the null hypothesis of equal allele frequencies.

Quality control
We performed the analysis after excluding the 1% of SNPs that showed the highest variability as indicated by the size of the mean measures from the replicate array within tail [9]. In addition, the monomorphic SNPs were deleted from the dataset. Anderson-Darling, Shapiro-Wilk and Kolmogorov-Smirnov normality tests were performed on the Dnull distribution [33][34][35].
The distribution of the p-values using the quantilequantile (Q-Q) plot was examined to estimate the number and the magnitude of the observed associations between genotyped SNPs and DP-EBVs, compared to the statistics expected under the null hypothesis of no association.
Using the -log10 of the linkage test p-values for each SNP, a Manhattan plot was created. Manhattan plot is a SNP set out across the chromosomes for left to the right, and the heights correspond to the strength of the associations of the trait.
Bonferroni correction for multiple testing was applied in the analysis. The genome-wide significance threshold was set as a corrected p-value ≤ 0.05, which equated to a nominal p-value of approximately 7.62 × 10 −8 .

Annotation
The annotation analysis of significant SNPs was performed using UCSC, NCBI ENSEMBL and the Bovine SNP Annotation Tool (Snat), integrating the information from a variety of public bioinformatics databases (NCBI Entrez Gene, UniProt, Gene Ontology (GO), KEGG PATHWAY and AnimalQTLdb [36]). The Illumina BovineHD SNPs positions were converted from Bos_taurus_UMD_3.1 to Btau_4.6.1 assembly using the Batch Coordinate Conversion option in UCSC database as required by Snat tools. UCSC and NCBI databases were used to annotate those SNPs not included in Snat and to verify which of the significant SNPs were close (within 1 Mb [31,37]) to functional genes. GO and pathway analyses were performed