Genome-wide association study in Chinese Holstein cows reveal two candidate genes for somatic cell score as an indicator for mastitis susceptibility

Backgrounds Bovine mastitis is a typical inflammatory disease causing seriously economic loss. Genome-wide association study (GWAS) can be a powerful method to promote marker assistant selection of this kind of complex disease. The present study aimed to analyze and identify single nucleotide polymorphisms (SNPs) and candidate genes that associated with mastitis susceptibility traits in Chinese Holstein. Results Forty eight SNPs were identified significantly associated with mastitis resistance traits in Chinese Holstein cows, which are mainly located on the BTA 14. A total of 41 significant SNPs were linked to 31 annotated bovine genes. Gene Ontology and pathway enrichment revealed 5 genes involved in 32 pathways, in which, TRAPPC9 and ARHGAP39 genes participate cell differentiation and developmental pathway together. The six common genome-wide significant SNPs are found located within TRAPPC9 and flanking ARHGAP39 genes. Conclusions Our data identified the six SNPs significantly associated with SCS EBVs, which suggest that their linked two genes (TRAPPC9 and ARHGAP39) are novel candidate genes of mastitis susceptibility in Holsteins. Electronic supplementary material The online version of this article (doi:10.1186/s12863-015-0263-3) contains supplementary material, which is available to authorized users.


Background
Bovine mastitis is one of the most typical inflammatory diseases causing seriously economic loss in modern dairy farms and quality problems of dairy food worldwide [1]. Since the heritability of mastitis is low, genetic improvement on anti-mastitis by traditional selection is not very effective [2]. Moreover, it is not easy to measure mastitis in field scale. Somatic cell count (SCC) or log transformed SCC (somatic cell score, SCS) have relatively higher heritability compared to mastitis and are used as the first trait to improve mastitis resistance [3]. In addition, to avoid uncertain influences such as farms, seasons, sires and etc., estimated breeding values (EBVs) of somatic cell scores (SCSs) were normally used as pseudo-phenotypes of mastitis related traits in dairy cattle. Genome-wide association study (GWAS) is widely considered a potential method to promote marker assisted selection of mastitis related traits based on single nucleotide polymorphism (SNP) [4].
The previous GWAS for mastitis susceptibility showed multifarious results in different Holstein populations. Family-based association tests such as single locus regression analysis and transmission disequilibrium test have the robust advantage to population heterogeneity [5]. In 2011, Sodeland's group detected QTLs for clinical mastitis on Bos taurus autosome (BTA) 2, 6, 14, and 20 in Norwegian red cattle [6]. In 2012, Meredith et al. reported that 9 SNPs located on BTA 6, 10, 15 and 20 were significantly associated with SCSs in Holstein sires and cows [7]. The same year, Wijga et al. [8] reported that SNPs relevant to log transformed lactation-average somatic cell scores or the standard deviation of test-day somatic cell score were mainly located on BTA 4, 6 and 18. In addition, strong associations of SNPs with clinical mastitis and SCS were reported on bovine BTA 6, 13, 14 and 20 in Nordic Holstein cattle by Sahana et al. [9]. Recently, GWAS performed in German Holstein cows identified significant SNPs on BTA 6, 13, 19 and X [10]. The studies in US Holstein dairy cows have shown that genetic variants on BTA 2, 14, 20 have impacts on clinical mastitis. The identified region on BTA 14 contains lymphocyte-antigen-6 complex (LY6) including LY6K, LY6D, LYNX1, LYPD2, SLURP1, PSCA genes in regulating the major histocompatibility complex [11]. The studies in Chinese population containing Chinese Holstein, Sanhe cattle and Chinese Simmental have analyzed that TLR4 gene (Toll-like receptor 4) and BRCA1 gene (Breast cancer 1) have the significant association with SCS [12,13]. Even though many studies have identified significant SNPs, only one SNP (BTA-77077-no-rs, Position: 85527109) on BTA 6 was identical in the reports of Sahana et al. [9] and Abdel-Shafy et al [10]. These results implied that the significant SNPs associated with mastitis traits were not identified consistently and should be confirmed and validated in different Holstein populations.
In order to detect functional candidate genes for mastitis-related traits, GWAS was conducted with mixed model based single locus regression analysis (MMRA) in Chinese Holstein populations. Six common SNPs were identified by MMRA and two linked genes were disclosed with significant effects on mastitis-related traits in Chinese Holstein populations.

Significant SNPs associated with SCSs EBVs
The -log 10 P of all tested SNPs for SCS EBVs with MMRA is shown in Fig. 1. The significant SNPs associated with SCS EBVs were mainly located on BTA 14.
The genomic association SNPs detected by MMRA were presented in Table 1. In total, 48 significant SNPs on chromosome level were detected including 13 SNPs on genome level. As shown in Table 1, 41 out of 48 SNPs were located within or near 31 known genes.
Two candidate genes for mastitis-related traits TRAPPC9 and ARHGAP39 genes (each contains three significant SNPs on genome level) identified by MMRA can be considered potential candidate genes for mastitisrelated traits. To decipher the effect of each genotype in each potential candidate gene on mastitis-related traits, the SCS EBVs of the cows with three genotypes were compared. As shown in the left panel of the Fig. 3, the cows with genotype AA in the two genes all owned significant higher SCS EBVs compared to the other genotypes (P < 0.001). These results appropriately confirmed the two genes (TRAPPC9 and ARHGAP39) as potential candidate genes for SCS EBVs. The right panel of the   . It was displayed that the cows with genotype AA had a tendency of higher SCC along DIM than the other two genotypes for the two genes especially for TRAPPC9 gene (Fig. 3).

Gene ontology and pathway enrichment for the significant SNPs on genome level
Through the Gene Ontology (GO) analysis of GenCLiP 2.0 (http://ci.smu.edu.cn/GenCLiP2.0/analysis.php?ran-dom=new), we found that 5 genes perform mainly functions in 32 pathway terms presented in Table 3 and Fig. 4. Through enrichment of five genes, ARHGAP39 gene can totally participate 24 pathway terms including two pathway terms combined with TRAPPC9 gene (GO:0030154 and GO:0048869), which influence cell differentiation or cellular developmental process.

Discussion
The present study identified significant SNPs and novel candidate genes associated with mastitis-related traits in Chinese Holstein population with mixed model based single marker regression analysis (MMRA). Two genes (TRAPPC9 and ARHGAP39) identified by significant SNPs indicate that they are important candidate genes associated with mastitis-related traits. To our knowledge, this is the first study to decompose the genetic background of mastitis-related traits in Chinese dairy cattle using MMRA assay. With regards to TRAPPC9 gene, it was reported that its product NIBP (NIK and IKKβ-binding protein) can enhance cytokine-induced NF-κB signaling pathway through interaction with NIK (NF-κB-inducing kinase) and IKKβ (IκB kinase-β) [14,15]. In recent studies, TRAPPC9 gene was considered as candidate gene for autosomal recessive non-syndromic mental retardation [16,17]. In the present study, the SCS EBVs (2.99) of the cows with AA genotype of SNP (ARS-BFGL-NGS-100480) in TRAPPC9 gene is significantly higher than the other two genotypes (P < 0.001). The similar tendency of the three genotypes was independently proved in a completely different Chinese Holstein population (n = 314, our unpublished data). As for ARHGAP39 gene, it was proved to be function to activate Rho GTPase which is known as new targets in cancer therapy [18]. Therefore, it is clear that the present study These SNPs are not assigned to any chromosomes and noted as "0" These SNPs are not assigned to any chromosomes and noted as "0" Fig. 3 The SCS EBVs and curves of SCC in different genotypes of TRAPPC9 and ARHGAP39 genes. **refers to P < 0.001

ARHGAP39 TRAPPC9
Chr14: 236532-2711615 Fig. 2 Linkage disequilibrium (LD) pattern for 10 significant SNPs on BTA 14. Solid line triangles refer to linkage disequilibrium (LD). One square refers to LD level (r2) between two SNPs and the squares are colored by D'/LOD standard scheme (LOD is the logarithm of likelihood odds ratio and the reliable index to measure D'). D'/LOD standard scheme is that red refers to LOD > 2, D' = 1; pink refers to LOD > 2, D' < 1; blue refers to LOD < 2, D' = 1; white refers to LOD < 2, D' < 1 screened functional closely related genes to bovine mastitis resistance. From the reported GWAS based on single locus regression analysis, it is not easy to identify the certain SNPs associated with SCS or mastitis-related traits. As shown in Table 1, 7 significant SNPs located on BTA 14 on whole genomic level (P < 1.14E-06) by MMRA in Chinese Holsteins were completely different from all the reported significant SNPs [7,8], whereas significant SNPs on BTA 14 are consistent with other studies [6, 9-11, 19, 20]. In comparison, one significant SNP UA-IFASA-9288 (BTA 14, Position: 2201870) in Chinese Holstein was close to (147413 bp) the SNP ARS-BFGL-NGS-107379 (Position: 2054457) which was identified in Nordic Holstein [9]. However, Tiezz et al. [11] identified a region associated with clinical mastitis from 2,574,909 to 3,137,184 bp on BTA 14 which contains three genome-wide significant SNPs (ARS-BFGL-NGS-100480, ARS-BFGL-NGS-56327 and UA-IFASA-5306) covered by TRAPPC9 gene in this study. These GWAS studies suggest that mastitis-related traits as low heritable polygenetic traits are mainly controlled by multiple loci which distributed across the whole genome and each with relatively small genetic effect. Although SCS is continuous trait which normally used as important indicator of mastitis, it is usually unstable and easily influenced by environment [21,22]. Therefore, to disease indicator trait, current strategy has changed to performing association studies in cases and controls test [23], because of mastitis resistance or susceptibility can be considered as threshold traits [2]. In the current another study, we defined that the left and right parts of the population with half/one standard deviation of SCS EBVs were mastitis susceptibility group (case) and healthy group (control), respectively, and analyzed the two groups with ROADTRIPs (Robust Association-Detection Test for Related Individuals with Population Substructure) (version 1.2) (http://faculty.washington.edu/tathornt/ software/ROADTRIPS2/) using bovine 54 k SNPs information. Although the decreased population size and increasing bias affect the testing power of the case-control association assay, we also have found two significant SNPs linked to two genes (TRAPPC9 and ARH-GAP39) by ROADTRIPs of case-control test compared with MMRA results, which strongly suggest that these genes are novel candidate genes for mastitis traits.
The genes closed to or covered significant SNPs were further subjected to bioinformatics analysis. Results from Gene Ontology (GO) analysis (Table 3) indicated that TRAPPC9, ARHGAP39 and PTK2 genes play a role in regulation of cell differentiation (GO: 0030154, P = 0.033) or developmental process (GO: 0048869, P = 0.039). From the cluster result of GO analysis (Fig. 4), we found that ARHGAP39 and PTK2 genes are mostly close genes, which participate 24 pathway terms. However, TRAPPC9 gene has less result in GO analysis, thus the related pathways are needed to do further functional analysis.

Conclusions
Although lower detecting power exists in SCS EBVs and other mastitis resistance traits, results consistently support that the significant SNPs are mainly located on the BTA 14 in the Chinese Holstein cows. TRAPPC9 and ARHGAP39 genes reveal the two novel candidate genes associated with mastitis resistant traits in dairy cattle.

Ethics statement
All protocols for collection of the blood sample of experimental cows were reviewed and approved by the Institutional Animal Care and Use Committee (IACUC) at China Agricultural University.

Animals and phenotype
A total of 2,093 cows from 14 sires were collected to construct the study population. The number of daughters of 14 sires range from 83 to 358 with an average of 150. Although the 14 sires were genotyped, they were not used in the association study in order to avoid double use of daughters' information. These daughters were from 15 Holstein cattle farms in Beijing, China. No specific permissions were required for these locations/ activities.
As closely following normal distribution, somatic cell scores (SCSs) are calculated from SCCs as (log 2 (SCC/ 100,000) + 3). To avoid environment influence, EBVs of SCSs were provided as the phenotypes in the GWAS. These EBVs were obtained based on a multiple trait random regression test-day model [24] using the software RUNGE provided by Canadian Dairy Network (CDN) (http://www.cdn.ca).

DNA extraction and genotypes
Genomic DNA of the whole blood was extracted using the TIANamp Blood Genomic DNA Purification kit (Tiangen inc. Beijing, China). The criteria of DNA quality control were DNA concentration should be larger than 50 ng/μL, the ratio of OD260/OD280 in the range of 1.7-1.9 and the ratio of OD260/OD230 in the range of 1.5-2.1.
The cows were genotyped using Illumina Bovine SNP50 BeadChip [25]. The genotypes were edited according to the criteria: (1) call rate > = 90 %; (2) SNPs did not deviated extremely from Hardy-Weinberg equilibrium (P >10 −6 ); (3) minor allele frequency > = 3 %). After quality control, a total of 43,885 SNPs were available for MMRA. Distribution of SNPs on each chromosome after quality control and the average distances between adjacent SNPs are shown in Additional file 1: Table S1.

Association analysis
Mixed model based single locus regression analysis (MMRA) applied to perform GWAS in our studies is as follows: MMRA: Where y is the vector of phenotypes (SCS EBVs), μ is the overall mean, b is the vector of coefficients of the regression on SNP genotypes, x is the vector of SNP genotypes, a~(0, Aσ a 2 ) and e~(0, Wσ e 2 ) are the vectors of the polygenic effects and residuals, where A is the additive genetic relationship matrix and W is a diagonal matrix with diagonal elements of 1/REL i to weight residuals variance for heterogeneity [26]. REL i is the reliability of EBV for the i th individual. σ a 2 and σ e 2 is the additive variance and residual error variance respectively. For each SNP, the estimated b and V arb are obtained via mixed model equations (MME). In addition, an approximate Wald chi-squared statisticb 2 =V arb with df =1 is estimated for the SNPs significantly associated with phenotypes. This association analysis was conducted using a program written in FORTRAN language by our group [26].

Statistical inference
To decrease the false positive rate of multiple tests and screen more available SNPs as well as find more functional related genes, Bonferroni multiple testing (P < 0.05) was adopted to adjust for number of SNPs on genome and chromosome level. The results of Bonferroni threshold for genome and each chromosome divided by 0.05 were listed in Additional file 2: Table S2. Linkage disequilibrium analysis for the significant SNPs on BTA 14 was performed using Haploview software (version 4.2) [27].
Student t-tests were conducted to compare the difference of cows SCS EBVs with different genotypes in each candidate gene.

Additional files
Additional file 1: Table S1. Distribution of SNPs on each chromosome after quality control and the average distances between adjacent SNPs. These data were derived from Bos_taurus_UMD_3.1 assembly (http:// www.ncbi.nlm.nih.gov/assembly/GCF_000003055.4/). SNPs which are not assigned to any chromosomes are noted as "0". (DOCX 25.5 kb) Additional file 2: Table S2. Results of Bonferroni thresholds at genomewide level and at chromosome-wide level for each chromosome. SNPs which are not assigned to any chromosomes are noted as "0". (DOCX 24.8 kb)