Determination of genetic effects of ATF3 and CDKN1A genes on milk yield and compositions in Chinese Holstein population

Background Our previous RNA-sequencing study revealed that the ATF3 and CDKN1A genes were remarkably differentially expressed between the mammary glands of lactating Holstein cows with extremely high and low milk protein and fat percentage so that both of them were considered as candidates for milk composition. Herein, we further verified whether these genes have genetic effects on milk production traits in a Chinese Holstein cow population. Results By re-sequencing the entire coding and regulatory regions, we identified four SNPs in 5’promoter region, two in exons, seven in 3′ un-translated region (UTR), and six in 3’flanking region of ATF3 gene, and one SNP in exon 5, two in 3’UTR, and two in 3’flanking region of CDKN1A gene. Of these, only the SNP, c.271C > T (rs442346530), in exon 5 of CDKN1A gene was predicted to result in an amino acid replacement from arginine to tryptophan. Subsequent genotype-phenotype association analysis revealed that 19 SNPs in ATF3 and 5 SNPs in CDKN1A were evidently associated with 305-days milk yield, fat yield, protein yield, or protein percentage (P = < 0.0001 ~ 0.0494). Whilst, no significant SNPs in ATF3 gene were associated with fat percentage in both first and second lactations (P > 0.05), and only two SNPs of CDKN1A gene, c.271C > T (P = 0.0377) and c.*654C > T (P = 0.0144), were markedly associated with fat percentage in the first lactation. Further, linkage disequilibrium (LD) analyses were conducted among the identified SNPs in ATF3 and/or CDKN1A genes to further confirm the association results. We also observed that the four SNPs, g.72834301C > A, g.72834229C > A, g.72833969A > G, and g.72833562G > T altered the specific transcription factor (TF) binding sites in ATF3 promoter, and one SNP, c.271C > T, changed the CDKN1A protein secondary structure, suggesting they might be the promising potential functional mutations. Conclusion Our findings first profiled the genetic effects of ATF3 and CDKN1A genes for milk production traits in dairy cattle and will be available for marker-assisted breeding in dairy cattle. Electronic supplementary material The online version of this article (doi:10.1186/s12863-017-0516-4) contains supplementary material, which is available to authorized users.

identified 31 differentially expressed genes between the mammary glands of lactating Chinese Holstein cows with the high and low milk protein percentage and fat percentage, of which, 17 genes are located within the known QTL regions for milk protein and fat traits and 21 contain or are near to the significant SNPs identified by previous GWAS, including ATF3 and CDKN1A [9]. The ATF3 gene (chr16: 72,820,025-72,832,974) were 1.9 Mb or 3.3 Mb close to the two SNPs, ARS-BFGL-NGS-70836 (chr16: 74,741,746) associated with milk protein percentage (P < 3.58E-07) and ARS-BFGL-NGS-85980 (chr16: 76,091,078) associated with fat percentage (P < 6.46E-07) and protein percentage (P < 8.33E-09) detected by a previous GWAS in dairy cattle [10]. ATF3 (activating transcription factor 3) is a transcription factor belonging to the mammalian activation transcription factor/cAMP responsive element-binding (CREB) protein family [11]. Our GO and Ingenuity Pathway Analysis (IPA) also found that ATF3 was involved in the lipid metabolism pathway, accumulation of glycoproteins, apoptosis, and regeneration of epithelial cells so that it is thought to be related to milk fat formation and development of mammary epithelial cells [9]. In addition, the CDKN1A gene (chr23: 10,560,498-10,568,780) is located within the known QTL regions (BTA23: 12.4 cM) that were confirmed to have large genetic effects on milk protein percentage [12]. CDKN1A (cyclin dependent kinase inhibitor 1A) encodes a 21-kD protein (p21) that is a cyclin-dependent kinase inhibitor, which is a ratelimiting regulator in the transition from G1 to S phase [13], and is also a key mediator of growth arrest that induced by the tumor suppressor protein p53 [14]. So far, there is no study revealing the associations of ATF3 and CDKN1A genes with milk production in dairy cattle.
Based on our previous RNA-Seq study [9], the purpose of this study was to identify whether ATF3 and CDKN1A genes have a genetic effect on milk production traits in dairy cows. Herein, we detected the SNPs in ATF3 and CDKN1A genes, and analyzed the association between these polymorphisms with five milk production traits in a Chinese Holstein population. In addition, bioinformatics analysis was performed to profile the promoter activity and protein structure variation of ATF3 and CDKN1A genes.

Animals and phenotypic data
A total of 1093 Chinese Holstein cows from 40 sire families were used in this study, and each sire had between 6 and 70 daughters with approximately 27 daughters on average per sire. These cows were from 22 dairy farms belonging to the Sanyuanlvhe Dairy Farming Center, where standard performance testing for dairy herd improvement (DHI) has been regularly conducted since 1999. Estimate breeding values (EBVs) for 305-days milk yield, fat yield, fat percentage, protein yield, and protein percentage during the first and second lactations were provided by the Beijing Dairy Cattle Center (http://www.bdcc.com.cn/). The descriptive statistics of phenotypic values for dairy production traits in two lactations were shown in Additional file 1. All protocols for sample collections of experimental individuals and phenotypic observations were approved by the Institutional Animal Care and Use Committee (IACUC) at China Agricultural University, and the permit number is DK996.

DNA extraction
The whole blood samples of 1093 Chinese Holstein cows were extracted by TIANamp Blood DNA Kit (Tiangen, Beijing, China) according to the manufacturer's instructions. The genomic DNA from semen samples of the sires were extracted using the standard salt-out procedures. The quantity and quality of the extracted DNA were respectively measured using a NanoDrop 2000 Spectrophotometer (Thermo Scientific, DE, USA) and by the gel electrophoresis.

SNP identification and genotyping
We designed a total of 35 primers (Additional file 2) using Primer 3.0 (http://primer3.wi.mit.edu/) and Oligo 6.0 (Molecular Biology Insights, Inc., CO, USA) to amplify the entire coding region and 2000 bp of flanking sequences based on the genomic sequence of the bovine ATF3 (GenBank accession no.: AC_000173.1) and CDKN1A (GenBank accession no.: AC_000180.1) genes, and the primers were synthesized by the Beijing Genomics Institute (BGI, Beijing, China). We randomly divided the DNA samples of the 40 sires into two equal groups, and diluted all the DNA samples into the concentration of 50 ng/μL. Subsequently, we constructed two DNA pools (50 ng/mL) as the templates for the PCR amplification. The amplifications were performed using the PCR system or the touch-down PCR (Additional file 2). The purified PCR products were directly sequenced by ABI3730XL DNA analyzer (Applied Biosystems, Foster City, CA, USA), and the sequences were compared by DNAMAN 6.0 (Lynnon Biosoft, USA) and NCBI-BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi) to search potential SNPs. The identified SNPs were further individually genotyped for all the 1093 Chinese Holstein cows using the matrix-assisted laser desorption/ionization time of flight mass spectrometry (MALDI-TOF MS, Sequenom MassAR-RAY, Bioyong Technologies Inc. HK).

Association analyses
Association analyses between SNP genotypes and/or haplotypes and the five milk traits were conducted by SAS 9.13 (SAS Institute Inc., Cary, NC, USA), based on the following animal model: where, Y was the phenotypic values of each trait for the cows; μ was the overall mean; hys was the fixed effect of farm, year, and season; b was the regression coefficient of covariant M; M was the fixed effect of calving month; G was the SNP genotype or haplotype effect; a was the individual random additive genetic effect, distributed as N 0; Aδ 2 a À Á N 0; Aδ 2 a À Á , with the additive genetic variance δ 2 a ; and e was the random residual, distributed as N 0; Iδ 2 e À Á , with identity matrix I and residual error variance δ 2 e . Bonferroni correction was performed for multiple testing, and the significant level of the multiple tests was equal to the raw P value divided by number of tests. Currently, we considered the statistically significant association from being null effect if a raw P value is less than 0.05/N, where N represents the number of SNP loci.
We calculated the additive effect (a), dominant effect (d), and substitution effect (α) using SAS 9.13, and the computational formula was as follows: Where, AA, BB, and AB were the least squares mean of the milk production traits in the corresponding genotype.

Haplotype analysis
The linkage disequilibrium (LD) extent between all SNPs and haplotype blocks were estimated using Haploview 4.2 (Broad Institute of MIT and Harvard, Cambridge, MA, USA).

Bioinformatics analysis
To analysis the biological functions of the SNPs, we used the JASPAR database (http://jaspar.binf.ku.dk/cgi-bin/ jaspar_db.pl?rm=browse&db=core&tax_group=vertebrates) to profile the genetic variants of the transcription factor binding sites (TFBSs) of the SNPs associated with milk production traits in the 5′ promoter region (relative score > 0.85). In addition, we utilized NPSA SOPMA SERVER (https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_sopma.html) to predict the variations of protein secondary structure caused by missense mutation in the coding regions, and the parameters were window width (17), similarity threshold (8), and number of states (4).

SNPs identification
After sequencing the entire coding and up/downstream regions, we totally identified 19 and five SNPs for ATF3 and CDKN1A genes, respectively, and all the identified SNPs have been reported in the NCBI database (Table 1). There were four SNPs in the 5'promoter region, two in exons, seven in the 3'UTR, and six in the 3'flanking region of ATF3 gene. For CDKN1A gene, one SNP, c.271C > T, was located in the exon 5, c.*9C > G and c.*654C > T in the 3'UTR, and g.10569766 T > C and g.10569779 T > C in the 3'flanking region. Of these, only the SNP in exon 5 of CDKN1A gene (c.271C > T) was predicted to result in an amino acid replacement from arginine (CGG) to tryptophan (UGG), and the two SNPs of ATF3 gene in the coding region (c.291G > A and c.489G > A) were synonymous mutations.
In addition, in this study, we performed the pooled sequencing method to identify the allelic variants across the entire coding and regulatory regions of ATF3 and CDKN1A genes, and detected a total of 24 SNPs. However, this method has disadvantage that it may miss rare allelic variants due to sequencing cannot catch the very low fluorescent signal of the alleles with very low frequencies. Further, the allelic frequencies cannot be obtained merely by pooled sequencing, hence, we further genotyped 1093 cows and performed association analysis.

Associates between SNPs and five milk traits
Associations between the total 24 SNPs of ATF3 and CDKN1A genes and five milk production traits were presented in Tables 2 and 3. Using single-SNP association analysis, we found that 14, 16, 16 and 3 SNPs of ATF3 gene were significantly associated with milk yield (P = < 0.0001~0.031), fat yield (P = 0.0002~0.0197), protein yield (P = < 0.0001~0.0331), and protein percentage (P = 0.0108~0.0494) in the first lactation, respectively. For example, the cows with allele C in g.72834301C > A showed higher milk yield, milk protein yield and fat yield than that with allele A ( Table 2). Similarly, 13, 14, 11 and 8 SNPs were associated with milk yield (P = < 0.0001~0.0078), fat yield (P = < 0.0001~0.0395), protein yield (P = < 0.0001~0.0461), and protein percentage (P = 0.0003~0.0371) in the second lactation, respectively ( Table 2). Of these, 13 SNPs of ATF3 gene (g.72834301C > A, g.72834229C > A, g.72833969A > G, g.72833562G > T, c.291G > A, c.489G > A, c.*685G > C, c.*1064G > A, g.72819850A > G, g.72818819A > G, g.72818818C > T, g.72818292 T > C, and g.72818161 T > C) were identified significantly associated with at least one milk trait in both the first and second lactations. Four SNP, c.*190A > G, c.*326A > G, c.*640G > A, and g.72819977 T > C, were only found evidently associated with the milk traits in the first lactation, and the other two SNPs, c.*321G > C and c.*735 T > C, were markedly associated with at least one milk trait in the second lactation (Table 2), however, the allelic effects of these six SNPs showed almostly same directions between the first and second lactations although some associations did not reach statistical significance of 0.05 (Table 2). Interestingly, no significant SNPs were associated with fat percentage in both first and second lactations.
Regarding CDKN1A gene, it showed that c.271C > T (P = 0.0162) and c.*9C > G (P = 0.0193) were significantly associated with 305-days milk yield, and c.271C > T (P = 0.0377) and c.*654C > T (P = 0.0144) were markedly associated with fat percentage in the first lactation. Four identified SNPs of CDKN1A gene, c.271C > T, c.*9C > G, g.10569766 T > C, and g.10569779 T > C, were significantly associated with 305-days milk yield, fat yield, protein yield, and protein percentage (P = < 0.0001~0.0197), and c.*654C > T was only evidently associated with protein yield (P = 0.0092) in the second lactation (Table 3). Additionally, three SNPs of CDKN1A gene, c.271C > T, c.*9C > G, and c.*654C > T, were both significantly associated with milk traits in both first (P = 0.0144~0.0377) and second (P = < 0.0001~0.0197) lactations. The g.10569766 T > C and g.10569779 T > C were merely associated with milk traits in the second lactation (P = < 0.0001~0.0197), but the allelic effects of them were almost in the same direction between two lactations. Taken together, the identified SNPs of CDKN1A gene were mainly associated with the milk yield, fat yield, protein yield, and protein percentage in the second lactation of Chinese Holstein cattle in this study ( Table 3).
The allele additive, dominant, and substitution effects of 19 SNPs identified in ATF3 gene were mainly significantly associated with milk yield, fat yield, and protein yield (P < 0.05), and the results in the first lactation were slightly different from that in the second lactation; nevertheless, it is interesting that only two SNPs, g.72833562G > T and g.72818819A > G, were evidently associated with fat percentage in the first lactation (P < 0.05) (Additional file 3). Nevertheless, the allele additive, dominant, and substitution effects of 5 SNPs of CDKN1A gene were mainly associated with milk yield, protein yield, and protein percentage (P < 0.05) (Additional file 4).

Prediction of TFBSs variations in promoter region of ATF3 gene
With regard to the four SNPs on the 5'promoter region of ATF3 gene, we predicted the variation of TFBSs after mutation using JASPAR software. As the results shown in Table 6, the A allele in g.72834301C > A created the putative binding sites for the transcription factor E2F3 (E2F transcription factor 3; relative score = 0.88) and Zfp423 (zinc finger protein 423; relative score = 0.89). The binding sites for the transcription factor Bcl6 (B-cell CLL/ lymphoma 6) was invented because of the A allele in g.72834229C > A (relative score = 0.87), and g.72833969A > G resulted in the appearing of the binding sites for transcription factor STAT3 (signal transducer and activator of transcription 3) after the substitution by G allele (relative score = 0.95). It was also indicated that the binding site for transcription factor Zfp423 was arisen due to the T allele in g.72833562G > T (relative score = 0.88).

Variant structure of CDKN1A protein caused by missense mutation
There was a missense mutation (p.Arg91Trp) on CDKN1A protein caused by c.271C > T. The results from the SOPMA SERVER revealed that α-helix was changed from 32.92% to 32.3%, β-turn from 4.35% to 5.59%, and random coil from 49.69% to 49.07% (Table  7), indicating that the CDKN1A protein secondary structure was altered from arginine to tryptophan.

Discussion
This study was a follow-up investigation of our previous RNA-Seq work, in which the ATF3 and CDKN1A genes were potentially associated with milk protein and fat percentage [9]. Here, we first determined that SNPs within the ATF3 and CDKN1A genes were significantly associated with milk production traits in dairy cattle. Of these,  four SNPs, g.72834301C > A, g.72834229C > A, g.72833969A > G, and g.72833562G > T, potentially changing the ATF3 promoter activity, and one SNP, c.271C > T, potentially altering the CDKN1A protein secondary structure, might be potentially causal mutations. With regard to the associations of ATF3 and CDKN1A with five milk traits, we found that eight SNPs showed different associations between the first and second lactations (Tables 2 and 3). The possible reason may be that different number of cows were used for association analysis (1093 cows in the first lactation versus 769 cows in the second lactation) because 324 cows merely completed their milking of first lactation, which could impact the statistical significance. In addition, the physiologic status of cows between the first and second lactations are different as well, generally, cows show higher milk production in the second lactation. Further, the directions of allelic effects of the eight SNPs on the milk traits were almost consistent between the two lactations. Studies revealed that haplotype blocks in human genome is independent of their surrounding areas with regard to LD [15][16][17], and the haplotype analyses were widely applied to the genetic variation studies [18,19]. Our haplotypes analyses showed that the SNPs were highly linked, and the haplotypes were also significantly associated with milk yield, fat yield, fat percentage, protein yield, and protein percentage, which were consistent with the associations of SNPs with milk traits.  ATF3 is involved in the TLR4 (toll-like receptor 4) signaling pathway, and its expression was markedly increased causing by chemotaxis and diapedesis in vitro, indicating that ATF3 acts as the early adaptive-response gene having an important role in maintaining the cellular homeostasis [20]. It was reported that ATF3 activates the cyclin D1 expression, thereby stimulating the mouse hepatocellular proliferation [21]. As we know, liver plays a key role in lipid metabolism, including fatty acid uptake, synthesis and oxidation, glycerolipid synthesis, and triacylglycerol export [22]. Subsequently, Invernizzi et al. found ATF3 may modulate milk fat synthesis during lactation by participating in the endoplasmic reticulum stress pathway [23,24]. Our results also showed a significant relationship between SNP polymorphism of ATF3 gene and fat yield, thus it can be seen that ATF3 may be associated with lipid metabolism. The CDKN1A (p21) protein regulates the cell cycle at G1 and S phase by inhibiting the activity of cyclin-CDK2, −CDK1, and -CDK4/6 complexes [25]. Li and Capuco conducted a systematic search for estrogen-responsive genes in bovine mammary gland, and identified 23 regulatory networks, of these, CDKN1A gene occupied a focal position in the network that functions as cell cycle, cellular movement and cancers, indicating that CDKN1A may play an important role in mammary gland development [26]. Together, ATF3 and CDKN1A were considered as important promising candidate genes for milk production traits.
As we know, SNPs in transcription factor binding sites could lead to allele-specific binding of transcription factors and enhancing or repressing the gene expression [27]. Studies have revealed that transcription factor E2F3 [28][29][30][31], Zfp423 [32,33], and STAT3 [34,35] could activate or repress the gene expression, and Bcl6 might play as a repressor for gene expression [36,37]. Our results showed that the milk yield, fat yield, fat percentage, protein yield, and protein percentage were evidently decreased or had the downtrends in genotype AA, AA, and GG severally causing by g.72834301C > A, g.72834229C > A, and g.72833969A > G, indicating that the transcription factors E2F3, Zfp423, Bcl6, and STAT3, might prejudice the milk production by repressing the expression of ATF3 gene. On the contrary, the transcription factor Zfp423 might be also beneficial to the milk production through activating the expression of ATF3   gene, because that the genotype TT causing by g.72833562G > T increased the milk yield, fat yield, fat percentage, protein yield, and protein percentage in the two lactations. Overall, our findings suggested that the four SNPs, g.72834301C > A, g.72834229C > A, g.72833969A > G, and g.72833562G > T in ATF3 gene, might be the potential mutations in milk production traits by changing promoter activity in Holstein cattle.
In the present, we found the SNP c.271C > T of CDKN1A gene was a missense mutation that caused an amino acid substitution from arginine to tryptophan, and the association analyses showed that this SNP was remarkably associated with milk yield and fat percentage in the first lactation, and milk yield, fat yield, protein yield, and protein percentage in the second lactation. Importantly, the changes of the protein structure causing by this amino acid substitution showed that the α-helix changed from 32.92% to 32.3%. Generally, the α-helix was preferably located at the core of the protein and had important functions in proteins for flexibility and conformational changes [38], and it was presumed that the CDKN1A protein might be more stable in conformation when the base was C. Hence, the SNP c.271C > T in CDKN1A gene might be another potential functional mutation for milk production through changing the protein structure in dairy cattle. Further in-depth investigation should be performed to validate the biological functions of these SNPs.