Genomic regions associated with bovine milk fatty acids in both summer and winter milk samples

Background In this study we perform a genome-wide association study (GWAS) for bovine milk fatty acids from summer milk samples. This study replicates a previous study where we performed a GWAS for bovine milk fatty acids based on winter milk samples from the same population. Fatty acids from summer and winter milk are genetically similar traits and we therefore compare the regions detected in summer milk to the regions previously detected in winter milk GWAS to discover regions that explain genetic variation in both summer and winter milk. Results The GWAS of summer milk samples resulted in 51 regions associated with one or more milk fatty acids. Results are in agreement with most associations that were previously detected in a GWAS of fatty acids from winter milk samples, including eight ‘new’ regions that were not considered in the individual studies. The high correlation between the –log10(P-values) and effects of SNPs that were found significant in both GWAS imply that the effects of the SNPs were similar on winter and summer milk fatty acids. Conclusions The GWAS of fatty acids based on summer milk samples was in agreement with most of the associations detected in the GWAS of fatty acids based on winter milk samples. Associations that were in agreement between both GWAS are more likely to be involved in fatty acid synthesis compared to regions detected in only one GWAS and are therefore worthwhile to pursue in fine-mapping studies.


Background
Dairy producers are looking for ways to optimize bovine milk fat composition for human health, and to improve physical and functional properties of milk. Increasing the knowledge about the synthesis of milk fatty acids, by unravelling the genetic background of milk fatty acids, can aid in modifying bovine milk fat composition. Polymorphisms in diacylglycerol-O-acyltransferase 1 (DGAT1) and stearoyl-CoA desaturase 1 (SCD1) are known to have an effect on milk fatty acids, e.g. the DGAT1 K232A polymorphism explains 40% of the genetic variation of C16:0 [1] and the SCD1 A293V polymorphism explains 23% of the genetic variation of C16:1 [2]. However, there is still a considerable amount of genetic variation in milk fat composition that has not been assigned to polymorphisms.
Chromosomal regions associated with milk fatty acids can be detected by screening the whole genome in a genome-wide association study (GWAS). In GWAS studies many thousands of single nucleotide polymorphisms (SNPs) are being tested for associations. In general it is expected that only a small proportion of the SNPs will have a true association and only those that have an effect that is large enough will be significant. Setting a significance threshold is finding the balance between limiting the number of false positives and maintaining sufficient power. Replication of results in independent samples is a strategy to separate false positives from true associations [3,4].
In a previous GWAS we identified interesting regions of the bovine genome associated with milk fatty acids from winter milk samples [2]. To our knowledge, at present that is the only GWAS on bovine milk fatty acids. Not many populations that are large enough for GWAS have been phenotyped for milk fatty acids, because accurate measurement of milk fatty acids using gas chromatography is expensive and time consuming. In addition, genotyping a large number of individuals for a large numbers of SNPs is costly. The population that was used for our previous GWAS, based on milk samples taken in winter, has been phenotyped for milk fatty acids in a second milk sample which was taken in summer. Repeating the GWAS for milk fatty acids based on winter samples using the summer samples can confirm the previously detected associations and result in new associations.
In this study we performed a GWAS for fatty acids based on summer milk samples. This study repeats our previous GWAS for fatty acids based on winter milk samples from the same population and largely the same animals. Fatty acids from summer and winter milk are genetically similar traits and we therefore compared the regions detected in summer milk to the regions previously detected in winter milk to confirm associations.

Phenotypes
The phenotypes of the winter milk samples were described earlier in Bouwman et al. [2], the phenotypes of the summer milk samples will be described in detail, as well as the differences between winter and summer samples.
The fat composition of summer milk samples from 1,795 first-lactation Dutch Holstein Friesian cows was available for this study. The cows were housed on 383 commercial farms throughout the Netherlands. At least three cows were sampled per farm. The cows were between 97 and 335 days in lactation when the summer samples were taken. The pedigree of the cows was provided by CRV (Cooperative cattle improvement organization, Arnhem, the Netherlands) and consisted of 26,300 animals.
Milk fat composition was measured by gas chromatography as described in Stoop et al. [5]. Many fatty acids were measured, but only the major fatty acids are reported here: even-chain saturated fatty acids C4:0 to C18:0, even-chain (cis9) monounsaturated fatty acids C10:1 to C18:1, and the polyunsaturated fatty acid C18:2cis9,trans11 (CLA). These fatty acids made up 88% of the total milk fat. The fatty acids are expressed in terms of weight-proportion of total fat weight (w/w%).
The winter and summer milk samples were taken from the same cows during the same lactation. The moment of sampling resulted in some differences between the two samples. Winter milk samples were taken from February to March 2005, when Dutch cows are mainly kept indoors and fed silage. Summer milk samples were taken from May to June 2005, when Dutch cows are often grazing for at least some part of the day. Some cows sampled in winter were not in lactation anymore during summer, therefore additional cows were sampled in summer to assure at least 3 cows per herd. The cows were on average 167 days (63-282) in lactation when the winter samples were taken, and 247 days (97-335) in lactation when the summer samples were taken.

Statistical analysis of phenotypes
Variance components, heritabilities, and correlations between the fatty acids from summer and winter milk samples were estimated using a bivariate animal model in ASReml [6]: where y was the phenotype; μ was the overall mean; dim was the covariate describing the effect of days in milk; afc was the covariate describing the effect of age at first calving; season was the fixed effect of the class of calving season (June-Aug 2004, Sept-Nov 2004, or Dec 2004-Jan 2005); scode was the fixed effect accounting for differences in genetic level between proven sire daughters and test-sire daughters, proven sires are selected and therefore their daughters might have a different genetic level than daughters of test sires; herd was the random effect of herd, distributed as N(0, I σ herd 2 ), with identity matrix I and herd variance σ herd 2 ; animal was the random additive genetic effect, distributed as N (0, A σ a 2 ), with the additive genetic relationship matrix A and the additive genetic variance σ a 2 ; and e was the random residual, distributed as N(0, I σ e 2 ), with identity matrix I and residual variance σ e 2 .

Genotypes
Blood samples were collected from the cows for DNA isolation. Of the 1,795 cows phenotyped for the summer milk sample 1,656 were successfully genotyped for 50,855 single nucleotide polymorphisms (SNPs) using a custom Infinium Array (Illumina, San Diego, CA, USA) designed by CRV. The assumed map positions of the SNPs were based on the bovine genome assembly BTAU 4.0 [7]. The average distance between SNPs was 52,452 bp. Of the 50,855 SNPs, 591 SNPs were located on the X chromosome, and 776 SNPs could not be mapped to any of the Bos taurus (BTA) chromosomes and were assigned to BTA 0. The SNPs on BTA 0 and the X chromosome were included in the study. Single nucleotide polymorphisms with a genotyping rate < 80% (n=392), monomorphic SNPs (n=236), and SNPs with 1-9 observations for one of the genotype classes (SNPs with such low number of observations in one of the genotype classes were excluded from further analyses to reduce the number of spurious associations) (n=5,646) were discarded from the original SNP set, resulting in the final marker set of 44,581 SNPs used for the summer GWAS.

Ethical approval
Genomic DNA of the cows was isolated from whole blood samples of the cows. Blood samples were collected in accordance with the guidelines for the care and use of animals as approved by the ethical committee on animal experiments of Wageningen University (protocol: 200523.b).

Genome-wide association based on summer milk samples
For the GWAS based on the summer milk samples, both phenotype and genotype data were available for 1,656 individuals. A single SNP GWAS was performed using an univariate animal in ASReml that was the same as model 1 but extended with a fixed effect for the SNP genotype. The genome-wide FDR was based on the P-values from the animal model using the R package 'qvalue' [8]. A genome-wide FDR was calculated for each trait individually. Associations with a genome-wide FDR<0.05 were considered significant. SNPs significant for one trait located close to each other were termed a "region". All significant SNPs in a region might be associated with the same causal mutation. A region was defined as follows: it started at the first significant SNP on a chromosome that was followed by an additional significant SNP within 10 Mbp; the region was extended as long as another significant SNP occurred within 10 Mbp from the previous one and ended at the last significant SNP that was not followed by another significant SNP within the next 10 Mbp. Thus, each region contained at least two SNPs significant for the trait. More than one region could be present on the same chromosome when there were groups of significant SNPs located within 10 Mbp from each other but further than 10 Mbp from the other region(s) on that chromosome. The genetic variance explained by a SNP was calculated from the estimated genotype effects from the statistical model and the observed genotype frequencies.
The result was expressed as a percentage of the total additive genetic variance obtained from model 1. These percentages can be overestimated due to the so called Beavis effect, especially when the effect of a SNP is small [9]. For each trait, the proportion of genetic variance explained by the most significant SNP in a region was reported. Note that the most significant SNP in a region can differ between traits.

Comparison of summer and winter GWAS results
The GWAS results from the winter milk samples from Bouwman et al. [2] and from the summer milk samples were compared to each other. A total of 1,564 cows were studied in both the summer and the winter GWAS, 92 cows were only studied in the summer GWAS, and 142 only in the winter GWAS.
In Bouwman et al. [2] a two-step single SNP approach was used for the GWAS of fatty acids from winter milk samples. Due to computation time, in the study by Bouwman et al. [2] only regions which showed significant associations in analyses using a general linear model were re-analysed using an animal model to account for all relations between the animals. Here we used a one-step approach using the animal model only.
To make results from the previous study and the present study comparable, the GWAS based on winter milk samples was redone using an animal model for all SNPs. The Pearson correlation between the -log 10 (P-values) of the general linear model for winter milk samples and the animal model for winter milk samples was 0.95, which indicates that the general linear model used correctly identified the regions of interest and that the results for winter milk samples of Bouwman et al. [2] are comparable with the results presented in the current study.
For the individual GWAS studies for winter and summer milk samples a FDR<0.05 was used. A FDR<0.05 is stringent, especially when looking for agreement of results. Therefore, a SNP was qualified as associated with both summer and winter milk fatty acids when the FDR threshold was smaller than 0.20 in both GWAS studies. We choose a FDR threshold of 20%, because the FDR of agreement between the two GWAS studies would then be 4% (20%*20%) if the studies were independent [10]. SNPs in agreement were reported when at least more than one SNP was in agreement between the summer and winter GWAS in a region (were region was defined as above) or when the SNP was in agreement between the summer and winter GWAS for more than one trait. Table 1 shows that the phenotypic variation of milk fatty acids was larger in summer than in winter. Summer milk samples contained more long chain fatty acids (C18:0, C18:1, and CLA) and less C16:0 than winter milk samples (Table 1). Phenotypic correlations between fatty acids of summer and winter milk samples ranged between 0.36 and 0.67 (Table 2). Genetic correlations between fatty acids of summer and winter milk samples ranged between 0.77 and 1.00 (Table 2). Genetic correlations between summer and winter samples of C4:0, C6:0, C12:0, C18:0, C10:1, C12:1, C14:1, C16:1, and C18:1 (ranging between 0.90-1) were not significantly different from one ( Table 2). The genetic correlations for C8:0, C10:0, C14:0, C16:0, and CLA were significantly different from one but showed strong positive correlations (0.77-0.94, Table 2). Herd correlations between fatty acids of summer and winter milk samples ranged between 0.16 and 0.54 (Table 2).

Results
Genome-wide association based on summer milk samples Figure 1 shows the genome-wide plots of -log 10 (P-values) of the GWAS of the fatty acids based on the summer milk samples. In total, 51 regions were associated with one or more fatty acids. Table 3 gives all detected regions and the percentage of the total additive genetic variation explained by the most significant SNP in that region for each of the fatty acids (for more detailed information about those most significant SNPs see Additional file 1). The most significant SNPs per region explained 2.2% up to 50.1% of the total additive genetic variation (Table 3). When all these most significant SNPs per region per fatty acid were analysed simultaneously they explained between 5.5% for C4:0 and 92.5% for C16:1 (Table 3). Three regions with major effects were associated with multiple fatty acids: BTA 14, BTA 19, and BTA 26. First we will describe the results for these three regions with major effects and then for the other regions associated with more than one fatty acid. Regions associated with only one fatty acid are given in Table 3 but will not be described here.
The association detected on BTA 26, between 1.4 and 39.0 Mbp, with C10:0, C10:1, C12:1, C14:1, and C16:1 was most significant near the SCD1 gene. The SCD1 gene is not mapped on the BTAU 4.0 [7], but is mapped at 21 Mbp on BTA 26 according to the UMD 3.0 map [11]. Two SNPs located in the SCD1 gene (including the SCD1 A293V polymorphism) were the most significant SNPs for the traits associated with the region on BTA 26. The SCD1 SNPs explained 4.5% of the genetic variation of C10:0 and 15-46.4% of the total additive genetic variation of the medium chain unsaturated fatty acids (Table 3).
Some SNPs located on BTA 0 were also significant. Blasting these SNPs against the UMD 3.0 map showed that they were mainly located in regions that already showed significant effects, such as region 14a, 19b and 26. Figure 2 shows two -log 10 (P-values) for each SNP, one for the summer (significant SNPs are orange squares in Figure 2) and one for the winter (significant SNPs are blue triangles in Figure 2) GWAS. The -log 10 (P-values) of SNPs that had a FDR<0.20 in both the winter and summer GWAS are indicated with green addition signs and show the regions that were found for both samples. Table 4 gives an overview of the regions associated with the summer milk fatty acids that were in agreement with the previous study of winter milk fatty acids. Only the regions that showed agreement between the summer and winter GWAS for more than one SNP or more than one trait are reported in Table 4, resulting in 34 regions.

Comparison of summer and winter GWAS results
Three regions with major effects, BTA 14, 19, and 26, were found for both summer and winter milk fatty acids. These regions were highly significant in the GWAS based on winter milk samples and were therefore expected to be found for the summer milk samples too. More interesting are the additional regions that were found in both GWAS studies and especially the eight regions (1, 2a, 3, 5a, 10, 14b, 17c, and 24 (Table 4)) that were not reported for the individual studies based on winter or on summer milk samples because their FDR was between 0.05 and 0.20. In some regions agreement between the summer and winter GWAS was based on a  . Green addition signs represent SNPs that were detected significant in both milk samples (FDR<0.20 in both studies). Note that each SNP is represented twice in this figure, once at the -log 10 (P-value) for the summer GWAS and once at the -log 10 (P-value) for the winter GWAS. The y-axis are cut off at -log 10 (P-value) of 15. single SNP but in other regions agreement was based on multiple SNPs. Also, some regions were associated with multiple fatty acids. We will report here the regions found to be associated with more than two fatty acids in both GWAS studies: regions 5c, 6, 13, 17b, and 27 (Table 4).
On BTA 5 the associations in region 5c with C6:0, C8:0, C10:0, C10:1, and C18:1 were in agreement with the winter GWAS [2]. There are no obvious candidate genes located in this region. In both GWAS studies this region was also associated with C14:0, but different The number of SNPs in agreement between summer and winter GWAS in the region are given and only regions txhat had more than one SNP or more than one trait in agreement between summer and winter GWAS are reported here.
SNPs were significant in the two studies (see Figure 2). The agreement between summer and winter GWAS for C14:1 seems to be in a separate region, region 5d at 108.8 Mbp.
On BTA 6 the associations with C6:0, C8:0, and C16:1 (Table 4) were in agreement with the winter GWAS [2]. This region contains the candidate gene peroxisome proliferator-activated receptor gamma, coactivator 1 alpha (PPARGC1A). Our SNP set contained 10 SNPs located in PPARGC1A, however, none of these SNPs were significant in winter nor in summer, but SNPs around (73,865bp before and 797,923bp after) the gene showed association in both the summer and winter GWAS. In the winter GWAS the region was also associated with C12:1 and C14:1, but this was not the case in the summer GWAS.
On BTA 13 the associations with C6:0, C8:0, C10:0, and C10:1 (Table 4) were in agreement with the winter GWAS [2]. The region on BTA 13 was associated with short chain fatty acids. A possible candidate gene in this region is acyl-CoA synthetase short-chain family member 2, which activates acetate for de novo fatty acid synthesis [12]. In the winter GWAS the region was also associated with C14:1 and C16:1, but this was not the case in the summer GWAS.
On BTA 27 the associations with C12:0, C14:1, and C16:1 (Table 4) were in agreement with the winter GWAS [2]. This region contains the candidate gene 1acylglycerol-3-phosphate O-acyltransferase 6, which is involved in attaching fatty acids on the second position of the triglyceride backbone. In the winter GWAS this region was also associated with C14:0, C16:0, C10:1, and C12:1, but this was not the case in the summer GWAS.
Besides agreement of association for both summer and winter milk fatty acids it is also interesting to see how well the significance levels and effects of SNPs from the summer and winter GWAS correlate. High correlation implies that the effect of the QTL is similar in winter as in summer and, thus, that there is no genotype by season interaction for the regions that were found in both GWAS studies. Winter and summer -log 10 (P-values) of SNPs that had a FDR<0.20 in both GWAS studies are plotted in Figure 3A and showed a correlation of 0.89. Winter and summer additive SNP effects of SNPs that had a FDR<0.20 in both GWAS studies, expressed in phenotypic standard deviation, are plotted in Figure 3B and show a correlation of 0.97.

Discussion
The aim of this study was to perform a GWAS of bovine fatty acids based on summer milk samples and to compare them to previous results of a GWAS of fatty acids based on winter milk samples [2]. For this GWAS we used different milk samples from largely the same set of cows with the same genotypes at a different stage of the same lactation. The main difference between the two seasons was the herd management, including feeding. In winter, all herds were kept indoors and fed silage, while in summer about half of the herds were grazing outside for at least part of the day and a few other herds were fed fresh grass. Diets of cows including fresh grass are known to alter milk fat composition e.g. [13,14]. This was reflected in our results: summer milk contained more long chain fatty acids and less C16:0 compared to winter milk. The phenotypic correlations for the different fatty acids between summer and winter samples ranged between 0.36-0.67 (Table 2), indicating that the summer samples provide additional information compared to the winter samples. Genetic correlations between summer and winter samples of C4:0, C6:0, C12:0, C18:0, C10:1, C12:1, C14:1, C16:1, and C18:1 (ranging between 0.90-1, Table 2) showed that these fatty acids are genetically the same trait in summer and winter [15]. The genetic correlations for C8:0, C10:0, C14:0, C16:0, and CLA were significantly different from one [15] but showed strong positive correlations (0.77-0.94, Table 2), suggesting that also for these fatty acids summer and winter samples have most genetic variation in common. It is important here to consider that strong positive genetic correlations are required to ensure that the traits have the same genetic background. However, for the summer milk sample to provide additional information to our previous GWAS phenotypic correlations should be weak.
This GWAS of fatty acids based on summer milk samples shows agreement with most associations detected in our previous GWAS of fatty acids based on winter milk samples [2]. Three regions with major effects detected in the winter GWAS [2] were also found in the summer GWAS. On BTA 14 a dinucleotide polymorphism in DGAT1 is causing the major effect. The DGAT1 K232A polymorphism is known to be associated with fat content and composition, so our results are in line with other studies [1,16,17]. The rather large region associated with the short and medium chain saturated milk fatty acids on BTA 19 confirm previous linkage studies [18][19][20]. Several candidate genes related to fat synthesis are located in this region, e.g. ATP citrate lyase, sterol regulatory element-binding transcription factor 1, signal transducer and activator of transcription 5A, growth hormone, and fatty acid synthase. There might be more than one QTL in this region given the size of this region and the many candidate genes, but the actual polymorphism(s) causing the effect(s) has not yet been identified. On BTA 26 a polymorphism in SCD1 is causing the major effect. The gene SCD1 is known to be associated with medium-chain unsaturated fatty acids, so our results are in line with other studies [16,[21][22][23][24].
Our results from both GWAS studies also suggest that there are additional QTL on BTA 14 besides DGAT1 that were associated with fatty acids. These additional QTL on BTA 14 were located at 3.0-3.8 Mbp, 32.7 Mbp, 40.8-50.9 Mbp, and 73.0-76.5 Mbp, and confirm detected QTL for milk production traits in linkage analyses reviewed in [25]. Candidate genes for these regions might be corticotropin releasing hormone at 30.5 Mbp, fatty acid binding protein 5 at 41.9 Mbp, and fatty acid binding protein 4 at 42.0 Mbp.
The three highly significant regions with major effects mentioned above were expected to be found in both GWAS studies. More interesting are the additional regions that were found in both GWAS studies such as the regions associated with more than two fatty acids: region 5c, 6, 13, 17b, and 27. Also worthwhile to mention are the 'new' regions that had a suggestive FDR between 5-20% and were not considered in the individual studies based on winter or on summer milk samples only, but became of interest because they were found in both studies. There are eight 'new' regions like this: region 1, 2a, 3, 5a, 10, 14b, 17c, and 24 (Table 4).
There are two different confirmation strategies regarding GWAS: replication and validation. Replication studies are meant to confirm that the actual association is a true association and should therefore be based on samples from the same population with minimal systematic differences [26,27]. Validation studies are meant to see if the association can be generalized over different populations and should therefore be based on samples from a different population, where this population can be different concerning genetic background, phenotype definition, sampling strategy, and time point of investigation [26,27]. A correctly performed replication is more likely to be successful in finding the same association again than validation, however, when an association is validated the associated SNP is probably closer to the actual polymorphism. In literature, replication and validation are often used interchangeably which complicates the interpretation of the results, especially when a study is called replication study but population, phenotype or study design are too different from the original study to be a replication study. Our GWAS has elements of a replication as well as of a validation study; it met criteria for a replication such as sufficient sample size, phenotypes were measured using the same method, same set of markers and a very similar population was used. It also met criteria for validation because the phenotypes were measured at different time points. The difference in season of measuring the phenotype provides additional information. Ideally an independent population of cows should have been sampled, but this was practically not feasible.
Replication of the study with largely the same set of animals and the same genotypes led to minor differences in LD and allele frequencies between studies, therefore it is more likely to confirm previously detected results [10]. However, spurious associations due to population structure or genotyping errors are more likely to be detected twice using the same set of animals and genotypes.
Agreement between the two GWAS studies was based on a FDR threshold of 20% in each study. If the winter and summer GWAS would be independent, the FDR of a region found in both studies would be 4% (20%*20%) [10]. A FDR of 4% gives enough reason to investigate such a region further. Lowering the threshold from 5% to 20% FDR resulted in the eight 'new' regions mentioned above, besides the regions that were already discovered in one of the individual studies and were in agreement with the other.
Even though we used largely the same set of animals, same genotypes, same phenotype measurement and a lower threshold for agreement between summer and winter GWAS not all regions were found in both summer and winter GWAS. This can be due to genotype by season interaction, due to lack of power (false negative QTL) or because these QTL were false positive QTL. It is not possible to determine which of these three reasons apply. However, the genetic correlations indicated that fatty acids are genetically similar traits in summer and winter, which suggests that genotype by season interaction may have only a small effect on the results. So lack of agreement between summer and winter GWAS is either due to lack of power or false positives.

Conclusions
This GWAS of fatty acids based on summer milk samples is in agreement with most associations that were previously detected in a GWAS of fatty acids based on winter milk samples. Lowering the FDR threshold from 5% in individual studies to 20% for agreement between both the summer and winter GWAS led to eight 'new' regions that were not considered in the individual studies, but had a suggestive FDR between 5-20% in both studies. It is more likely that genomic regions are involved in fatty acid synthesis when associations are found in both summer and winter GWAS compared to regions detected in only one GWAS. Detected associations that were in agreement between summer and winter GWAS are therefore worthwhile to pursue in fine-mapping studies.
Additional file Additional 1: Table S1. Most significant SNP per trait for each region significantly associated with fatty acids of the summer milk sample (corresponding to table 3), SNP position, significance level and the percentage of total additive genetic variance explained by the SNP.
Authors' contributions ACB carried out the analysis, prepared and drafted the manuscript. MHPWV helped to draft and edit the manuscript. JAMA initiated and established the overall project design. HB participated in the design of the study, the coordination and helped to draft the manuscript. All authors read and approved the final manuscript.