Estimation of genetic parameters and detection of chromosomal regions affecting the major milk proteins and their post translational modifications in Danish Holstein and Danish Jersey cattle

Background In the Western world bovine milk products are an important protein source in human diet. The major proteins in bovine milk are the four caseins (CN), αS1-, αS2-, β-, and k-CN and the two whey proteins, β-LG and α-LA. It has been shown that both the amount of specific CN and their isoforms including post-translational modifications (PTM) influence technological properties of milk. Therefore, the aim of this study was to 1) estimate genetic parameters for individual proteins in Danish Holstein (DH) (n = 371) and Danish Jersey (DJ) (n = 321) milk, and 2) detect genomic regions associated with specific milk protein and their different PTM forms using a genome-wide association study (GWAS) approach. Results For DH, high heritability estimates were found for protein percentage (0.47), casein percentage (0.43), k-CN (0.77), β-LG (0.58), and α-LA (0.40). For DJ, high heritability estimates were found for protein percentage (0.70), casein percentage (0.52), and α-LA (0.44). The heritability for G-k-CN, U-k-CN and GD was higher in the DH compared to the DJ, whereas the heritability for the PD of αS1-CN was lower in DH compared to DJ, whereas the PD for αS2-CN was higher in DH compared to DJ. The GWAS results for the main milk proteins were in line what has been earlier published. However, we showed that there were SNPs specifically regulating G-k-CN in DH. Some of these SNPs were assigned to casein protein kinase genes (CSNK1G3, PRKCQ). Conclusion The genetic analysis of the major milk proteins and their PTM forms revealed that these were heritable in both DH and DJ. In DH, genomic regions specific for glycosylation of k-CN were detected. Furthermore, genomic regions for the major milk proteins confirmed the regions on BTA6 (casein cluster), BTA11 (PEAP), and BTA14 (DGAT1) as important regions influencing protein composition in milk. The results from this study provide confidence that it is possible to breed for specific milk protein including the different PTM forms. Electronic supplementary material The online version of this article (doi:10.1186/s12863-016-0421-2) contains supplementary material, which is available to authorized users.


Background
In the Western world bovine milk products are an important protein source in human diet. The major proteins in bovine milk are the four casein (CN), α S1 -, α S2 -, β-, and k-CN which occur in the approximate ratio of 4:1:4:1 (w/w) in milk, and the two whey proteins, β-lactoglobulin (β-LG) and α-lactalbumin (α-LA), which occur in the mutual ratio of 3:1 in milk (w/w) [1][2][3]. Total protein yield is an important part of the dairy milk payment system and has therefore been included in the dairy cattle breeding goal [4]. Different genetic variants of the CN genes have an influence on the amount of CN in the milk, as well as on the cheese making properties of milk [5][6][7]. It has been shown that in poorly coagulating milk samples of the Danish Holstein (DH) and Danish Jersey (DJ) breeds, the predominant combination of genotypes was BB for α S1 -CN, A 2 A 2 for β-CN, and AA for k-CN [8]. More recently, it has been shown that both the amount of specific CN and their post-translational modifications (PTM) have profound influence on the milk coagulation properties [6,9]. Thus, breeding for detailed milk protein composition has attracted increased attention.
Apart from the disulphide bonds in the α S2 -CN dimer and the k-CN multimer, PTMs of the bovine caseins include phosphorylation of α S1 -CN, α S2 -CN, β-CN and k-CN, as well as glycosylation also of k-CN [2,6,10]. Phosphorylation of α S1 -CN results in a major form with eight phosphorylated serine residues (8P-α S1 -CN) and a minor form with nine phosphorylated serine residues (9P-α S1 -CN). For α S2 -CN, the major phosphorylated form contains 11 phosphate groups (11P-α S2 -CN) and the minor form 12 phosphate groups, while β-CN is usually present with five phosphate residues and k-CN with one residue [11]. The glycosylation degree varies with approximately 30-60 % of k-CN bring glycosylated, while 95 % is phosphorylated with 1-3 phosphate groups [9,12]. Although there are different forms of this bovine protein due to multilevel phosphorylation and glycosylation, the mono-phosphorylated non-glycosylated k-CN is the predominant (>50 %) [12].
CN proteins in the milk can form a multi-molecular structure called the CN micelle, which plays an important role in the coagulation of milk. The PTMs of the CN influence both stability and size of the CN micelles [13,14] and thereby influences technological properties of milk [6,13,15].
The majority of the studies reporting genetic variation for milk protein content are based on protein percentage or protein yield (e.g. [16][17][18]). These studies show that there is substantial genetic variation for total protein in bovine milk. Though only relatively few studies have reported genetic parameters for the detailed milk composition [19][20][21]. The individual CN and whey proteins show genetic variations in the coding sequences resulting in structural genetic variants [5]. These genetic variants have different expression levels, presumably due to further mutations in the regulating elements leading to differential expression levels [5]. Furthermore, an association study for detailed protein composition showed that the main regions associated with protein percentage and protein composition were located on chromosomes 5, 6, 11, and 14 [22]. Recently it was shown that the PTM of milk protein, like glycosylation of k-CN, shows genetic variation [15]. Furthermore Bijl et al. [23] showed that α S1 -CN isoforms representing 8 or 9 phosphorylations, respectively, showed genetic variation and apparently was regulated by different sets of genes.
Within the Danish-Swedish Milk Genomics Initiative the milk protein profile of the Danish Holsteins and Danish Jerseys has been studied in detail [6,7,9,24]. Apart from the major genetic variants of the CN genes a study on the genetics underlying the expression of the major milk proteins and their isoform has not been carried out.
The objective of this study was to estimate the heritability of the major milk proteins and their isoforms representing post-translational modifications and to perform a genome-wide association study (GWAS) for the detailed milk protein profile in Danish Holstein (DH) and Danish Jersey (DJ) dairy cattle.

Animals
All samples were taken within the Danish-Swedish Milk Genomics Initiative. "The overall experimental strategy underlying this study was to minimize potential sources of environmental variation and maximize the level of genetic variation in the sample population. As a result, the pedigree of the selected animals was designed to include as unrelated animals as possible" [25]. In total, the 456 DH cows were sired by 239 bulls and 450 dams, whereas the 436 DJ cows were sired by 152 bulls and 429 dams. Single morning milk samples were collected once from 456 DH cows (20 dairy herds, October -December 2009) to 436 DJ (22 dairy herds, February -April 2010) from conventional herds during the indoor period. Between 19 and 24 cows were sampled from each herd. The cows sampled were all in mid-lactation (d129 to d229 in DH and d130 to d252 in DJ) and within parity 1, 2 or 3. The cows were housed in loose housing systems, fed according to standard practice, and milked twice a day. The milk samples were placed on ice for transport to the laboratory immediately after milking. Once at the laboratory, the milk samples were treated as described by Poulsen et al. [25].

Genotypes and genomic relationship matrix
In total 371 DH and 321 DJ cows were genotyped using the bovine HD SNP array (www.illumina.com/documents/ products/datasheets/datasheet_bovineHD.pdf). Genomic DNA was extracted from ear tissue. The platform used was an Illumina® Infinium II Multisample assay device. SNP chips were scanned using iScan and analyzed using Beadstudio version 3.1 software (Illumina, https:// www.illumina.com/). The quality parameters used for the selection of SNPs in the GWAS were minimum call rates of 80 % for individuals and 95 % for loci. Marker loci with minor allele frequencies (MAFs) below 1 % were excluded. The quality of the markers was assessed using the GenCall data analysis software of Illumina. Individuals with average GenCall scores below 0.65 were excluded following Teo et al. [26]. The Bos taurus genome assembly (Btau_4.0) [27] was used to assign the SNP positions on the genome. In total 494,984 SNP markers were used in both DH and DJ. These genotypes were used to calculate a genomic relationship matrix (GRM) as described by VanRaden et al. [28]. In short: M is a matrix of n x m specifying which marker alleles each individual inherit, where n = the number of individuals and m = the number of markers. M contained elements -1, 0, 1 representing homozygote, heterozygote and the other homozygote, respectively. The diagonals of M'M counts the number of homozygous loci for each individual and off diagonals measure the number of alleles shared by relatives. P contain the allele frequencies (p i ), such that column i of P equals 2(p i -0.5). The allele frequency is then: To set the expected mean value to 0, Z was created by subtracting P from M. The genomic relationship matrix G was then calculated as ZZ′/[2∑p i (1-p i )] [28].

Estimation of heritability
Variance components were estimated using the REML approach in DMU [29]. Within each breed, the following model was used in the analysis: Where Y ijkl is the phenotype of individual l in herd i and lactation j, μ is the fixed mean effect, herd i is a fixed effect (i = 1, 2, …, 20 DH; i =1, 2, …, 22 DJ), parity j is a fixed effect (j = 1, 2, 3 DH, j = 1, 2, 3 DJ), b 1 is the regression coefficient for DIM k , DIM k is a covariate of days in milk (d129 to d229 in DH, d130 to d252 in DJ), and animal is the random additive genetic effect based on G of animal l [30].
Univariate analyses were performed to estimate the heritability, which was defined as: where σ 2 a was the additive genetic variation, and σ 2 e was the residual variation.

Association mapping
The association analysis was performed using model 1 extended with an extra covariate for the SNP:b 2 is the allele substitution effect, SNP m is a covariate indicating if a SNP is homozygote (0,2) or heterozygote (1). The effect of the SNP was tested by a Wald test with a null hypothesis of H 0 : b = 0. The analyses were performed using REML in the R interface of DMU [28] (available at http://dmu.agrsci.dk). Significance thresholds were determined using a false discovery rate (FDR) correction using the R package "qvalue" version 1.34.0 (http:// github.com/jdstorey/qvalue) [31]. A FDR of P < 0.10 was considered significant.

Linkage disequilibrium along the genome
Local pairwise LD (r 2 ) between SNP markers on BTA14 was calculated using haploview [32] (Additional file 1: Figure S1 and Additional file 2: Figure S2). Genome-wise pairwise LD was calculated between the SNP markers within each Mb along the genome using the r 2 as a measure based on the software plink v1.07 [33].

Meta-analysis of the GWAS results
A meta-analysis combining the DH and DJ populations was performed based on the sample size based method as implemented in METAL [34]. In this method an intermediate Z-score is calculated as: where P i is the P-value for study i, and Δ i is the direction of the marker effect for study i. The statistics is calculated as: where N i is the sample size for study i. The overall Z-score is then calculated as The overall P-value is then calculated as: P = 2ϕ(| − Z|). Significance thresholds were determined using a false discovery rate (FDR) correction using the R package "qvalue" version 1.34.0 (http://github.com/jdstorey/qvalue) [31]. A FDR of P < 0.10 was considered significant.

SNPs assigned to genes
The SNPs on the bovineHD chip were mapped to the Btau4.0 assembly. The data from this download contained 26,352 genes with an Entrez Gene ID. For each gene the location on the bovine genome was determined as 5 Kb before the start position of the first exon to 5 Kb after the end position of the last exon. Hence, the defined gene region includes all introns and parts of the upstream and downstream regions of the gene. When a SNP was located in this region it was assigned to the corresponding gene.

Results
The descriptive statistics for the protein traits in both DH and DJ are reported in Table 1. These are in line with the results presented by Poulsen et al. [32] on the full data, showing that DH has a lower protein (3.43 %) and CN contents (2.66 %) compared to DJ (4.29 % protein, 3.00 % CN). Further, DH has a higher relative concentration of β-CN (36 %) compared to the DJ (β-CN% 28 %), whereas the protein content of α S1 -CN, α S2 -CN, k-CN, β-LG, and α-LA were more similar between the two breeds. Differences between breeds in relation to degree of PTMs were observed, both in terms of phosphorylation and glycosylation. There is a difference in the PD of α S1 -CN and α S2 -CN, with the less phosphorylated forms being lower in DH compared with DJ, resulting in lower relative amounts of the 8 and 11 P forms of α S1and α S2 -CN in DH, respectively. On the other hand, fraction of G-k-CN% was higher in DH compared with DJ, with GD for k-CN in DH was 24 % versus 20 % in DJ.

Heritability
The heritability estimates for the protein traits for both DH and DJ are presented in Table 1. For DH, high heritability estimates were found for protein percentage With regard to isoforms of specific proteins, DH showed a much higher heritability for 11P-α S2 -CN, G-k-CN and U-k-CN compared to DJ, whereas the heritability for 8P-α S1 -CN is much lower in the DH compared to the DJ. This is also reflected in the PD for α S1 -CN and GD of k-CN (Table 1).

GWAS
The GWAS results for DH are presented in Additional file 3: Table S1 and for DJ in Additional file 4: Table S2 including the allele-substitution effect, location and annotation.

k casein
Most significant SNPs were detected for k-CN% (4,742). Of these 4,742 SNP markers 2,609 SNP markers were located on BTA6. The most significant quantitative trait locus (QTL) was detected on BTA6 in a region from 87,385,233 bp to 87,421,141 bp spanning the CSN3 gene. The most significant SNP were BOVINEHD0600023914 (rs110516603) and BOVINEHD0600023927 (rs136864341) with each a -log 10 (P-value) = 49.21 explaining 2.7 % of the total variation. The k-CN% is a combination of the glycosylated and un-glycosylated fraction. For the G-k-CN% a total of 2,758 significant SNP markers were detected. Most significant SNP markers were detected on BTA1 (81), BTA6 (1,609), BTA7 (107), BTA10 (125), BTA11 (143), BTA13 (78), and BTA29 (116), whereas for the U-k-CN% a total of 2,544 significant SNP markers were detected. Most significant SNP markers were detected on BTA1 (161), BTA2 (136), BTA6 (1,642), BTA11 (94). Just like k-CN%, both G-k-CN% and U-k-CN% have the majority of the significant SNP on BTA6 close to the k-CN gene. Figure 1 shows the significant markers along the genome for both G-k-CN% and U-k-CN%. Out of the markers significant at FDR < 0.10, 1,039 markers show overlap between G-k-CN% and U-k-CN%, whereas 1,719 SNP markers were specific for G-k-CN%, and 1,505 SNP markers were specific for U-k-CN%. The most significant specific SNP markers for G-k-CN% were BTB-01653149 (rs42768815), BOVINEHD10000 20692 (rs132942592), BOVINEHD1000020694 (rs13456 7350), BOVINEHD1000020695 (rs136061111) and ARS-BFGL-NGS-40559 (rs110826777) on BTA10 each having a -log 10 (P-value) = 7.74 explaining 2.1 % of the total variation. These markers are in complete LD (r 2 = 1).The most significant specific SNP markers for U-k-CN% was BOVINEHD1000005681 (rs110977200) on BTA10 with a -log 10 (P-value) = 5.99 explaining 16.9 % of the total variation. The location of the peak for G-k-CN% and U-k-CN% on BTA10 were approximately 55 Mb apart. Even though the significant SNP markers for both G-k-CN% and U-k-CN% were located across all autosomes, trait specific peaks were detected on BTA7, BTA11, BTA13, BTA18, BTA22 and BTA29 for G-k-CN% and on BTA1, BTA2, BTA4 and BTA5 for U-k-CN% (Fig. 1). Comparing the GWAS results of k-CN% to the U-k-CN% and G-k-CN% showed that the peaks for G-k-CN% on BTA7, BTA10, BTA18 and BTA22 were specific for G-k-CN%.

Linkage Disequilibrium in the Danish Holstein and Danish Jersey data-sets
The LD analysis of the DH and DJ data-sets showed that there is a difference in the bin-wise LD between these two breeds. The DJ showed higher bin-wise LD across the genome (max mean r 2 = 0.75, min mean r 2 = 0.09) compared to the DH (max mean r 2 = 0.74, min mean r 2 = 0.07) (Fig. 2). The decay with physical distance to reach half of the maximum value (r 2 /2 = 0.375) for the DJ was near 23,000 bp, while the decay in DH reaching half of the maximum value (r 2 /2 = 0.37) was near 19,000 bp.

Meta-analysis of the combined Danish Holstein and Danish Jersey data-sets
The results of the meta-analysis in comparison with the within breed analysis are given in Additional file 5: Figure S3, Additional file 6: Figure S4, Additional file 7: Figure S5, Additional file 8: Figure S6 and Additional file 9: Figure S7. No significant markers were detected for α S1 -CN, 8P-α S1 -CN, 9P-α S1 -CN, β-CN, and α-LA as well as for PD of α S1 -CN% and α S2 -CN% and GD for k-CN.

Protein percentage
The meta-analysis for protein percentage confirmed a major QTL on BTA14 which was detected by both the DH and DJ. In addition a QTL was detected on BTA20 with a strong signal in the meta-analysis. Furthermore a number of smaller meta-analysis peaks show up, but these are mainly due to breed specific signals like BTA2 (in the middle of two QTL peak one specifically for DH and the other for DJ), BTA4 (DJ), BTA12 (DJ) and BTA20 (DJ) (Additional file 5: Figure S3).

Casein percentage
BTA14 was confirmed as a major QTL which was detected by both DH and DJ. On BTA25 the meta-analysis QTL peak was much stronger compared to the within breed analysis results. The QTL meta-analysis peaks, which were mainly due to one breed, were detected on BTA4 (DJ), BTA6 (DJ), BTA10 (DH) and BTA20 (DH) (Additional file 6: Figure S4).

α S2 casein
A major QTL on BTA6 was confirmed by the metaanalysis which was also detected in DH and DJ separately. Furthermore, a QTL on BTA14 was confirmed which was detected by the within breed analysis for both breeds separately. On BTA12, a QTL was detected where the within breed analysis did not show a significant peak. In addition, the meta-analysis showed significant QTL which are mainly due to one breed at BTA2 (DJ), BTA8 (DH), BTA9 (DJ), BTA23 (DJ) (Additional file 7: Figure S5).

k casein
The major QTL on BTA6 detected in both breeds was confirmed in the meta-analysis. Furthermore, a few Fig. 2 The rate of linkage disequilibrium (LD) for the Danish Holstein (black dots) and Danish Jersey data-set (red dots). On the Y-axis the mean bin-wise LD is presented on the X-axis the physical distance between pair-wise markers is presented in Mega base pairs (Mbp) smaller QTL were detected with the meta-analysis on BTA1, BTA5, BTA8 and two QTL peaks on BTA10, BTA11, BTA13, BTA18, two peaks on BTA21, BTA26 and BTA29. The majority of the peaks were due to the DH breed except for BTA5 which was mainly due to DJ. In addition, two QTL peaks were only significant in the DH breed but did not show significance in the meta-analysis (BTA16, and BTA23) (Additional file 8: Figure S6).

β-LG
One major QTL was detected on BTA11, which was also revealed as the major QTL in the within breed analysis for both DH and DJ (Additional file 9: Figure S7). Figure 3 shows the distribution of the significant markers from the meta-analysis over the genome and the overlap between the protein traits (protein percentage, casein percentage, α s2 -CN, k-CN, and β-LG). Relatively low overlap has been found between the individual protein traits. Most overlap was detected for protein percentage and casein percentage on BTA14 (82 SNP markers) followed by BTA20 (15) and BTA25 (14), whereas for α S2 -CN and k-CN most overlap was detected on BTA6 (147). There was only one SNP marker in common between k-CN and β-LG on BTA11 (BOVINEHD4100009262 (rs109608280)).

Heritability
There was a significant difference between the DH and DJ in the protein profile for all traits except for G-k-CN (Table 1) as previously has been shown by Poulsen et al. [24]. The heritabilities of protein and CN percentages were lower in the DH compared to the DJ. In the literature, the heritability for protein percentage covers a wide range from 0.28 [35] up to 0.66 [20]. Such variation in heritability estimates can partly be due to differences in the breeds used, sample size, analytical method, experimental design and statistical models applied. In general the heritability determined in the present study for protein percentage in DJ milk was high compared to values reported in the literature [20,35,36], whereas the heritability estimates for the DH were in between the estimated heritabilities presented in the literature [20,21]. In the DH animals the all 371 animals included in this analysis had genotype BB for the α S1 -CN gene [8], which could explain the low heritability compared to the studies of e.g. Schopen et al. [20] and Bonfatti et al. [21]. Including more animals and multi-trait analysis could improve the estimation of the heritability of α S1 -CN% in the DH [37], but it would still be lower compared to values presented in the literature [20,21].
So far not many studies have focused the genetic analysis of the PTM forms of the milk proteins. Bijl et al. [23] showed differences in heritability for phosphorylation isoforms of α S1 -CN%. The 8P-α S1 -CN form had a lower heritability (0.48) compared to the 9P-α S1 -CN form (0.76) for the Dutch Holstein population. In DH the heritability for 8P-α S1 -CN is close to 0 most likely for the reason mentioned above. Interestingly, the heritability for 9P-α S1 -CN is 0.25, which could indicate that other genes than the α S1 -CN gene can be involved in the formation of 9P-α S1 -CN. In a recent study the heritability of the glycosylation of k-CN heritability was estimated in Simmental cattle (0.46) [15]. This is a different breed than the DH and DJ, which could give an explanation why the heritability was different (DJ < Simmental < DH). In all three breeds the heritability was substantial indicating that there is room for genetic selection for the glycosylation properties of the milk.

GWAS of Post Translational Modifications Glycosylation of k-CN
In this study the majority of the QTL for PTM were detected in the DH comparing the results of the G-k-CN versus the U-k-CN (Fig. 1). There was no difference in the mean values of the content of G-k-CN% between DH and DJ, but there was a profound difference in the heritabilities between DH and DJ (heritability DH > DJ (Table 1)). These differences in the heritabilities could potentially explain the difference in the number of QTL detected for G-k-CN and U-k-CN between DH and DJ. The comparison of the GWAS results between k-CN% and the U-k-CN% and G-k-CN% fractions revealed a number of G-k-CN% specific QTL peaks indicating that there is a potential to genetically differentiate between the G-k-CN% and U-k-CN% fraction in the k-CN in the milk. This would be of interest as glycosylation of k-CN is considered to stabilize the micelle structure. A higher fraction of G-k-CN would increase both its charge and the size of the hydrophilic k-CN C-terminal and thereby influence both the cleavage of the k-CN molecule into para k-CN and glycomacropeptide in the first phase of the coagulation process by chymosin as well as the coagulation properties of the milk represented by the second phase [13,38,39]. Comparing the results of our study to the study on rheological traits for rennet induced gelation showed that BTA7, BTA10, BTA18 and BTA22 were among the chromosomes identified to play a role in rennet induced coagulation [40]. There was an overlap between the QTL on BTA7 for log(gel strength) and the QTL detected on BTA7 for G-k-CN in our study indicating that QTL for G-k-CN could in part explain the genetic variation of coagulation properties of the milk.

Candidate genes
At this stage the information regarding the genetic regulation of the glycosylation of the k-CN is poorly understood. In this study we detected different SNPs which were found to be specific for G-k-CN (Additional file 10: Table S3). The O-glycosylation of k-CN results in glycan attachment at threonine residues resulting in attaching 1 to 6 glycans at specific sites: Thr142, Thr152, Thr154, Thr157 (only variants A and E), Thr163, Thr166, Thr186 [10,41]. Among the genes assigned to the significant SNP markers there were no obvious candidate genes based on their biological information. It is interesting to mention though that there were two genes which were related to PTMs of caseins. These genes were: Casein kinase 1, Gamma 3 (CSNK1G3) on BTA7. This gene is a ubiquitous serine/threonine-specific protein kinase that phosphorylates caseins and other acidic proteins [42], and protein kinase c. theta (PRKCQ) on BTA13. Protein kinase C (PKC) is a family of serine-and threonine-specific protein kinases that can be activated by calcium and the second messenger diacylglycerol (http://www.genecards.org/ cgi-bin/carddisp.pl?gene=prkcq). However it remains unclear if these genes have a specific role in the glycosylation of k-CN.
Phosphorylation of α S1 -CN and α S2 -CN Phosphorylation forms of α S1 -CN seem to be regulated by a different set of genes [23]. In their GWAS study it was shown that both the 8P and 9P form of α S1 -CN were regulated by a region on BTA6, while the 8P form was further affected by a region harboring the PEAP gene on BTA11, while the 9P form was additionally regulated by a region harboring DGAT1 on BTA14 [23]. In our study, we identified a region on BTA12 in DH for 8P-α S1 -CN%. Further we identified regions on BTA6 and BTA11, these were, however, not significant. The reason for this could be that the number of animals used in the GWAS analysis in our study is relatively small compared to the mapping population size of Bijl et al. [23]. A smaller population size results Fig. 3 Overview of the distribution of significant markers (FDR < 0.10) of the meta-analysis over the genome for protein% (prot%), casein% (CN%), α S2 -CN%, k-CN%, β-LG%. Gray dots represent uneven chromosomes, black dots represent even chromosomes. Red triangle: overlapping SNP markers between prot% and CN%, green triangle: overlapping SNP markers between α S2 -CN%, k-CN%; pink triangle: overlapping SNP markers between k-CN%, β-LG% in a lower power to detect an association, and therefore could explain the missing overlap.

GWAS of major proteins Single GWAS versus meta-analysis GWAS
In this study, within population GWAS was followed by a meta-analysis for the major milk proteins and their PTM forms. As single GWAS are underpowered due to small sample sizes, meta-analysis which combines information from independent studies can increase power and reduces false-positive findings [43]. However the increase in power in the meta-analysis can only be seen when the same loci influencing the trait of interest are segregating in both populations but are not significant in the single population GWAS study due to lack of power. In such cases the meta-analysis could enhance the association signal and detection probability. We studied two distinct dairy cattle breeds (DH and DJ). They differ both phenotypically in the milk composition [8,44], as well as in their genetic make-up as these breeds have been genetically separated for many generations [45] and have undergone strong artificial selection. This is reflected in the difference in the LD structure of the DH and DJ samples used in this study (Fig. 2). Furthermore, it has been shown that genomic prediction across Holstein and Jersey populations are difficult [46]. This is reflected in the metaanalysis of this study, the majority of the QTL which was considered significant was also significant in either the DH or DJ population except for the QTL on BTA12 for α S2 -CN% (Additional file 7: Figure S5).

Major QTL regions
If the genome-wide Bonferroni correction for multiple testing was applied only three major regions for the protein composition would be detected both in the DH and DJ: BTA6 (k-CN) covering the CN gene complex, BTA14 (protein percentage and casein percentage) covering the DGAT gene and BTA11 (β-LG) covering the PEAP gene. This is in line with the findings of Schopen et al. [22]. Interestingly when analyzed as a yield trait the QTL on BTA14 for protein was not detected [22]. This is in line with the findings of Bovenhuis et al. [47] who detected significant association with mineral composition in the milk and DGAT when analyzed as a percentage trait, while analyzed as a yield trait the association with DGAT disappeared. This suggested that the QTL on BTA14 has an indirect effect on protein and casein percentage [47]. and Arla Foods amba (Viby J, Denmark), as well as part of the "Phenotypic and genetic markers for specific milk quality parameters" of the Milk Levy Fund, Denmark (2011-2013). BB, NAP and LBL were supported by the project "Breeding high value milk: BigMilk" of the Milk Levy Fund, Denmark (2015-2018). GG is enrolled in the Erasmus-Mundus joint doctorate European Graduate School in Animal Breeding and Genetics.
Availability of supporting data Information on the SNP dataset supporting the conclusions of this article is available at: http://res.illumina.com/documents/products/ datasheets/datasheet_bovinehd.pdf, http://support.illumina.com/ downloads.html, and http://www.animalgenome.org/repository/cattle/. Furthermore, the dataset(s) with SNP association results supporting the conclusions of this article are included within the article and its additional files.
Authors' contributions BB processed the genotypes, performed the genetic analysis, and wrote the manuscript. NAP collected the milk samples and contributed to the milk analysis and discussion of the results. GG contributed to the analysis of the data. LBL contributed to the planning, sampling, milk analyses and discussion of the results. All authors contributed to the manuscript and approved the final version of the manuscript.

Competing interest
The author(s) declare that they have no competing interests.

Consent for publication Not applicable.
Ethics approval and consent to participate