A high density linkage map of the bovine genome

Background Recent technological advances have made it possible to efficiently genotype large numbers of single nucleotide polymorphisms (SNPs) in livestock species, allowing the production of high-density linkage maps. Such maps can be used for quality control of other SNPs and for fine mapping of quantitative trait loci (QTL) via linkage disequilibrium (LD). Results A high-density bovine linkage map was constructed using three types of markers. The genotypic information was obtained from 294 microsatellites, three milk protein haplotypes and 6769 SNPs. The map was constructed by combining genetic (linkage) and physical information in an iterative mapping process. Markers were mapped to 3,155 unique positions; the 6,924 autosomal markers were mapped to 3,078 unique positions and the 123 non-pseudoautosomal and 19 pseudoautosomal sex chromosome markers were mapped to 62 and 15 unique positions, respectively. The linkage map had a total length of 3,249 cM. For the autosomes the average genetic distance between adjacent markers was 0.449 cM, the genetic distance between unique map positions was 1.01 cM and the average genetic distance (cM) per Mb was 1.25. Conclusion There is a high concordance between the order of the SNPs in our linkage map and their physical positions on the most recent bovine genome sequence assembly (Btau 4.0). The linkage maps provide support for fine mapping projects and LD studies in bovine populations. Additionally, the linkage map may help to resolve positions of unassigned portions of the bovine genome.


Background
Advances in technology have dramatically increased the ability to cost-effectively genotype a large number of SNPs in humans and farm animals [1,2]. The majority of the SNPs have been placed in physical, but not linkage maps. Increasing the resolution of bovine linkage maps will improve estimates of linkage disequilibrium (LD) [3,4] and increase the success rate of fine mapping quantitative trait loci (QTL) in cattle. The possibility that any particular SNP does not have a functional role is outweighed by its indirect use as a genetic marker associated to a causal variant [5]. In addition, mapped SNPs provide information about LD patterns over the genome and allow the identification of haplotype blocks [4,6,7].
Historically a diverse variety of methodologies and procedures have been used to order bovine chromosomal segments [8][9][10][11][12][13][14][15][16]. A physical map [17] and several linkage maps have been reported for the bovine genome [16,[18][19][20][21][22]. To date, the linkage map of Snelling et al. [16] has the highest number of genetic markers positioned. Their linkage map is comprised of 4,585 markers (including 913 SNPs), in 2,475 unique positions covering 3,058 centimorgans (cM) in total. Since Kappes et al. [23] reported advances in the sequencing of the bovine genome, a 7.1 fold coverage of the genome has been attained and this has generated over 2 million bovine SNPs that are currently in NCBI dbSNP Build 129 [24]. Affymetrix produced a commercial genotyping panel of approximately 10,000 bovine SNPs [25], 92% of which were derived from this sequencing resource [26]; the remaining eight percent were derived from Australia's Commonwealth Scientific and Industrial Research Organisation (CSIRO) [27].
The objective of this work is to present a high-density bovine linkage map (HDBLM) that combines a low-density microsatellite based linkage map (LDM) with SNPs from the Affymetrix GeneChip™ Bovine Mapping 10K SNP kit (hereafter called 10K SNP panel) [25]. Results from the HDBLM could enhance the understanding of the alignment and orientation of contigs and scaffolds in the bovine genome assembly, thus allowing the examination of relationships between physical distances, linkage disequilibrium (LD) and genetic map distance. This would provide a framework to identify causal relationships between genomic variation and animal performance traits.

Genotype quality
Genotypes were received from Affymetrix (Santa Clara CA, USA) for 9,713 SNPs with an average call rate of 99.25% for the 10K SNP panel. A total of 1,891 SNPs were removed for the following reasons: departure from Hardy-Weinberg Equilibrium (HWE) (120), more than 50 inheritance inconsistencies (260), having an allele with frequency lower than 5% (1,494), and less than 10 informative meioses (17) (Additional file 1). Genotypes from six animals were used as blind duplicates with an average concordance between of samples of 99.93%. A total of 1,189 SNPs (hereafter called orphan SNPs) were not initially assigned to any one chromosome; 1,053 of these SNPs were subsequently assigned to a single chromosome. There were 955 SNPs from the 10K SNP panel initially incorrectly assigned to a chromosome (hereafter called displaced SNPs), 779 of which we were able to reassign to a different chromosome. The stringent threshold criteria utilized for the assignment of these SNPs prevented the allocation of some of the 136 orphan SNPs and some of the 176 displaced SNPs. The inability to place to a chromosome some of these orphan and displaced SNPs could have been reduced by lowering the stringency of the threshold criteria used during the assignment. The final marker data set consisted of 7,510 SNPs from the 10K SNP panel in addition to 294 microsatellites, three milk protein haplotypes and two gene-based SNPs. Table 1 shows the mean number of informative meioses for all of the autosomal markers. The method of Breen et al. [28] was used to calculate the resolution for an autosomal marker map. Using the average of 366.9 informative meioses, the 95% confidence level for a distance was calculated to be 0.80 cM.

Genetic maps
A total of 7,066 markers were mapped (294 microsatellites, three haplotypes and 6,769 SNPs) ( Table 2). The autosomal markers were distributed across 3,078 unique positions ( Figure 1). The linkage map for the 29 bovine autosomal chromosomes was 3,097.4 cM with an average Kosambi distance [29] of 0.449 cM. The smallest genetic distance present in each chromosome was 0 cM and the On average the genetic distance per unit of physical distance (cM/Mb) was 1.25 (Table 3). Chromosome 20 had the lowest cM per Mb ratio. Chromosome size accounted for 42% of the variation in inter-chromosomal genetic distances per Mb (P-value 6.5 × 10 -5 ); the correlation of chromosome size to recombination distance was -0.66. We were unable to assign 2,946 of the 9,713 SNPs to the linkage map. Of the 7,822 SNPs that passed quality control, 7,510 SNPs were allocated to a confirmed chromosome. Six hundred and fifty two SNPs that had an assigned chromosome were not mapped because a unique map position could not be found and their inclusion based on physical position served to increase the length of linkage map above the defined threshold. There were 91 SNPs with unknown physical position, thus preventing their insertion analysis.

Comparison with Bovine genome Btau 4.0
There was not complete concordance in marker order between the linkage and physical maps ( Figure 2). The average Pearson correlation between the order of linkage positions and the physical positions was 0.985 over the genome. Although the correlations were high for the majority of the chromosomes, there were a number of local discrepancies (Additional file 2). Both point discrepancies (e.g. see Figure 2 for chromosome 3) and inver-sions ( Figure 2, chromosome 27, distal region) were observed.

Discussion
The linkage map presented in this paper is the most dense map to date for cattle; the relatively high number of informative meioses per available SNP represented in the 10K SNP panel is greater than that reported by Snelling et al. [16] thus enabling a high degree of marker placement by the mapping software. The number of SNPs that were available from the 10K SNP panel could have been increased further. For example, this could have been accomplished by lowering the allele frequency criterion used to remove any SNP, from 5% to 2%. This would have allowed informative meioses to dictate the placement or rejection of SNPs by the mapping software into the linkage maps. The detection of displaced SNPs was carefully monitored. Some displaced SNPs had formed clusters with small genetic distances between them but the cluster was placed further than the established threshold of 20 Marker locations on bovine genome sequence autosomes cM from either the most-distal or most-proximal marker of any other linkage group. The success rate for identifying these SNPs relied on the information content of each one of the markers. We set more stringent criteria for marker placement than was previously published [16]; that is, we only accepted clusters with LOD scores above 15 and where at least two microsatellites belonged to the linkage group. Further, we did not allow linkage to any other groups. The subsequent placement of orphan and displaced SNPs in other than the originally assigned linkage maps assured us that the methodology utilised in assigning such markers to a chromosome was appropriate.  [22] linkage map, but the distance between common proximal and distal markers is larger. Of the two markers that mapped proximal to, and the 12 markers mapped distal to common markers with the linkage map of Ihara et al. [22], only one was placed during mapping round 5: Insertion phase, which utilises physical map data (see methodologies section).
The positions of all other markers in these two regions were based on linkage information. The genetic positions of the two proximal markers and the 11 distal markers that were placed by linkage information are in concordance with their physical position, except for a cluster of three SNPs. The order of common microsatellite markers that were assigned to our genetic map as well as several other bovine linkage maps [16,[18][19][20][21][22] are in complete agreement. Likewise, SNPs common to the genetic map presented by Snelling [16] and our linkage maps are in concordance.
The addition of 6,767 SNPs from the 10K SNP panel to the low-density microsatellite-based maps (LDM) resulted in both expansion and additional coverage of the linkage maps. The expansion was explained by an increase in genetic distance from proximal to distal markers of LDMs. This additional coverage was measured by an increase in genetic distances from the placement of the last SNPs of the 10K SNP panel to the proximal and the distal marker of a LDM. The magnitude of the expansion was of 80.4 cM and the coverage increased by 338.2 cM. The high reliability of the map presented here was made possible because of a high accuracy of genotyping, thorough pre-screening of the genotypic data for inconsistencies (mis-inheritance, departure from HWE, low allele frequency and less than 10 informative meioses), relatively high numbers of informative meioses and the abil-ity to place orphan and displaced SNPs. Hence this map will be useful to monitor the bovine genome assembly. Using the approach applied by Breen et al. [28], a map resolution of 0.80 cM between autosomal markers could be obtained from an average of 366.9 informative meioses. For our 3,097.4 cM autosomal linkage map the number of markers that could potentially be placed to unique positions is 3,872. Our autosomal linkage map has 3,078 unique marker positions and should be considered as not fully saturated. The average Kosambi distance is lower than that presented by Snelling et al. [16]. However, the coefficient of variation (CV) is greater, indicating that our linkage maps have a higher proportion of marker clusters (Table 2), (Figure 1). The insertion of an otherwise unmapped SNP by using its physical position is the most probable cause for the increased value in observed CV. An un-mapped SNP that belongs to a scaffold that already includes a mapped SNP(s) is not expected to increase genetic distances because it creates a cluster rather than a singleton.
The observation that chromosome size increases the average recombination rate was consistent with other studies   [31] and approximately twice that of the value of 0.63 found in mice (Shifman et al. [30]). Based on the bovine assembly Btau 4.0 [32], the total physical length from first proximal to last distal markers of our linkage maps was 2.605 Gbp (Table 3). Snelling et al. [17] reported a genome size of 3.1 and 2.9 Gbp estimated from the BAC and sequencing bovine genome project, respectively. Using a physical map of 3 Gbp, the average recombination distance would be approximately 1.1 centimorgans per million base pairs. These inconsistencies also introduce uncertainty in calculating chromosome-wise recombination rates. Inconsistencies between the order of markers in the linkage maps and their physical order (Additional file 2) prevented us from further investigating the recombination distance per physical distance within the chromosome.
The 7K-linkage map presented here has substantially improved on the previously incomplete assignment of SNPs from the 10K SNP panel, and has reordered SNPs that had been wrongly assigned. Thus, our linkage map has shown utility for identifying errors in the current sequence assembly of the bovine genome. In addition, the markers and linkage map will be valuable for fine mapping of QTL [33,34].
The assignment of SNPs to a chromosome from the 10K SNP panel was incomplete and some of their SNPs were wrongly assigned. The assigning and re-assigning of orphan and displaced SNPs to a chromosome and the further placement of these SNPs to unique positions in the linkage during mapping rounds 2-4 was based totally on linkage information. The inclusion of SNPs with up to 50 mis-inheritances in the construction of linkage maps did not have an effect on recombination distances. The markers and linkage map presented in this paper will be useful in the fine mapping of QTL using LD methods [33,34]. However, a number of marker clusters and gaps remain ( Figure 1). Further marker development that is being undertaken in the bovine genomics community will ensure that there is greater uniformity and marker density over the genome, which will be beneficial for applications of genomic selection [35]. In addition, the placement by linkage of SNPs from the 10K SNP panel (mapping rounds 2-4) will be useful in identifying inaccuracies in sequence assembly in the bovine genome assembly and in correcting chromosomal assignment for some SNPs from the 10K SNP panel. Approximately 20% of SNPs from the 10K SNP panel were not acceptable for map construction. The major factor for non-acceptance was an allele with a frequency lower than 5%. This probably reflects the origin of the SNPs coming primarily from the sequence of a Hereford cattle and being validated in different populations to the New Zealand Holstein-Friesian and Jersey cattle breeds. That is, this limitation could be a reflection of the breed origin of the SNP. The use of breed-specific SNPs and the knowledge of the physical position of SNPs are two aspects that should not be overlooked. Structural discrepancies observed between the order of the markers in the linkage map, and their physical position (Additional file 2), could be attributed to spurious information in the bovine assembly Btau 4.0 [32]. In the opinion of the authors, at the present time, the number of informative meioses has more weight in the acceptance of the linkage position of the markers than their physical position.

Conclusion
Using a unique animal resource, 7066 bovine genetic markers were positioned in our linkage map. Approximately 90% (6767 out of 7510) of the SNPs that passed quality control testing from the 10K SNP panel were placed on the linkage map (Additional file 3). The marker positions in the linkage maps are in good agreement with the physical positions obtained using Btau 4.0 of the bovine genome. The information from this linkage map has been used to describe patterns of LD in the bovine genome [36]. Additionally, it will support further genetic analysis of important economic traits in cattle and will help to resolve challenges encountered in the assembly of the bovine genome. The linkage map is not fully saturated, and thus the addition of more markers would be valuable.

Population
An outbred F 2 experiment of Holstein-Friesian and Jersey cattle breeds was undertaken in New Zealand to identify QTL and genes affecting dairy production [37]. The experiment consisted of 817 F 2 females, 796 F 1 dams, 6 F 1 sires and 60 F 0 males (Additional file 4). All sires of F 1 dams and F 1 sires are represented in the set of 60 F 0 sires. There were no matings between individuals that shared a sire.

Genotyping
In total, 1679 animals (male F 0 , as well as all F 1 and F 2 animals) from the experiment were genotyped by external laboratories according to standard practices for fluorescent dye-labelled primers, utilising Applied Biosystems 3100 genetic analysers ( [38], two gene-based SNPs (The non-conservative K232A substitution in the DGAT1 gene [39,40] and the F279Y SNP, which is a substitution in the transmembrane domain of the GHR gene [41]) and the 10K SNP panel. T six F 1 sires were screened for approximate 500 microsatellites. Where four out of six sires were heterozygous, the markers were used. The 10K SNP panel was genotyped 12 months later than the other markers.

SNP Quality Control
Before undertaking construction of high-density bovine linkage maps, SNPs from the 10K SNP panel were screened for segregation distortion by HWE [42] and misinheritance. A SNP showing any of the following criteria: departure from HWE (P-value less than 0.001), more than 50 records of mis-inheritance (inheritance had previously been confirmed from the microsatellites), an allele with a frequency lower than 5% in the F 0 and F 1 populations, or less than 10 informative meioses, was deleted from further analysis. The remaining SNPs that passed quality control testing for map construction each had at least one case of mis-inheritance.

Pedigree Structure
The linkage mapping utilized 1679 individuals from the F 2 design described by Spelman et al. [37]. All informative meioses for the autosomal maps are male and thus the maps are male-specific. The same is true of the pseudoautosomal part of the sex chromosome.
The non-pseudoautosomal part of the sex chromosome was constructed differently; it utilized maternally-derived genotypes (F 1 dam) and was therefore a female-specific map. The F 2 daughters' genotypes were comprised of maternally-derived alleles as well as paternally-(F 1 sire) inherited haplotypes. The maternally-inherited alleles were derived by subtracting the maternally-inherited haplotypes from the progeny genotypes as follows. Because recombination is not possible for the haploid sex chromosome in males, these maternally-inherited haplotypes represented entire (non-pseudoautosomal) chromosomes. This in turn enabled the maternally-inherited haplotype to be determined in the F 2 . As for their F 2 daughters, the F 1 dams' chromosome-long haplotypes were known. This is because their sires (the F 0 maternal grandsires) were genotyped. Therefore the F 1 dams' phases were known, increasing the ability to observe recombination events amongst their F 2 offspring. Our linkage map is based on a two-generation pedigree and it could be further enhanced using a three-generation pedigree. The number of animals involved in the pedigree structure, number of markers involved in map construction and limitations in hardware capability limited the use of a three-generation pedigree.

Construction Low-density microsatellite based linkage map (LDM)
There were five rounds of mapping. The first one used limited marker data (294 microsatellites, three milk protein haplotypes and two gene-based SNPs) and hence resulted in a low-density microsatellite-based linkage map. Subsequent rounds incorporated SNPs from the 10K SNP panel and enabled the construction of high-density linkage maps.

Mapping round 1
The LDM was constructed based on 294 microsatellites, three milk protein haplotypes and two gene-based SNPs (Figure 3(1a)). Construction of the map was done using the software package CRI-MAP V. 2.4 -Build option [43,44]. Modifications were done locally to the software to allow it to run on a 64-bit Opteron with 32 GB physical memory with a swap partition of 10 GB. No user memory limit was enforced. The CRI-MAP Chrompic Option [43,44] was used to remove unlikely double recombinants over a distance of 5 cM. The linkage map created in this initial round was used as framework map in mapping round 2 (Figure 3(2b)).

Construction of High-Density Bovine Linkage Maps
The 10K SNP panel did not have complete assignment of SNPs to a specific chromosome. Of the 7822 SNPs available from the 10K SNP panel, 1189 (orphan SNPs) were not initially assigned to a chromosome. Using the mapping information from mapping round 1, CRI-MAP V.2.4 (TWOPOINT option) [43,44], 1053 of these orphan SNPs were assigned to a chromosome. The criteria were: a likelihood of odds (LOD) threshold greater than 15 with at least two microsatellites belonging to the same linkage group and no other significant linkage to an alternative chromosome. In addition to CRI-MAP V.2.4 [43,44], the expert system software package MultiMap [45] was used to create the high-density bovine linkage map.
The MultiMap [45] parameter flip was evaluated by using different values. The optimum values for the flip parameter for these types of dense linkage maps are above three. When parameter flip values over three were used for the bovine chromosome 29 with 144 markers, it was found to be time-consuming, (from four-fold to 196-fold for flips 4 to flips 6, respectively) or halted when the parameter flip was set to seven. Our ability to support the final placement of markers in linkage maps with the use of a value higher than three for the parameter flips was prevented by the constraints of our computer hardware.

Mapping round 2
For each bovine chromosome, three low-density linkage maps were constructed: 1) low-density microsatellite linkage map (LD1) (Figure 3(2b)), 2) low-density SNP link-age map (LD2) (Figure 3(2c)), and 3) low-density microsatellite-SNP linkage map (LD3) (Figure 3(2d)). This mapping round was undertaken to map 7686 SNPs that had been physically assigned to a chromosome. Mul-tiMap [45] constructs comprehensive maps by using framework maps that can either be built by the program or supplied by the user. For LD1, the LDM from mapping round 1 was used as the framework. No framework map was used for LD2. For LD3, the map constructed by CRI-MAP V. 2.4 -Build option [43,44] (2a) was used as the framework. To enter a linkage map, the position for the SNPs had to exceed a LOD score of three with the Flips Option set to three. After all qualifying SNPs were mapped; the LOD score for SNP acceptance was lowered to two, thus allowing additional markers to be positioned. LD1 maps will always have all makers from LDM, plus additional SNPs from the 10K SNP panel.
The low-density linkage maps (LD1-LD3) comprise a mix of common markers (microsatellites as well as SNPs) and differ from each other only in SNPs from the 10K SNP panel. The three separate low-density maps (LD1, LD2 and LD3) were integrated into one linkage map termed ARTIFICIAL LINKAGE MAP -I (ALMI). The integration procedure was performed observing the following rules: markers that appeared in more than one of the three linkage maps were anchored; markers that occurred only in one of the low-density linkage maps were integrated into the ALMI, retaining their original order with respect to other markers within their own low-density linkage map. The resulting ALMI had a greater number of markers than the individual low-density linkage maps (LD1-LD3). There were no inconsistencies in SNP order among the three different low-density maps. In some cases, the integration of a marker was difficult due to the ambiguous Mapping flow chart  positions where it could be placed. However, this had no impact on the linkage map because MultiMap [45] was able to resolve the order in the subsequent runs during mapping round 3.

Mapping round 3
SNPs not mapped during mapping round 2 were brought into the linkage map using an iterative procedure with the AMLI used as the framework map. To enter the map, the position for the SNPs had to exceed a LOD score of two with the Flips Option set to two. The proposed placements suggested by MultiMap [45] for the remaining unmapped SNPs were tested and the SNP was placed if the Kosambi distance was equal or less than 0.5 centimorgans (cM) to the nearest marker. In some instances, a subsection of 20 SNPs in the region of a possible location was created as a framework map; MultiMap [45] was then able to place such SNPs. This methodology was continued until: a)-no further SNPs were placed into a unique position, b)-Proposed alternative placements suggested by MultiMap [45] numbered greater than three, or c)-a SNP was placed at both ends of a chromosome. During this mapping phase, several SNPs initially assigned to a specific chromosome were placed more than 20 centimorgans (cM) from either the most-distal or most-proximal marker. These SNPs (955 displaced SNPs) were removed from the linkage group as the linkage information indicated that they had been physically assigned to the wrong chromosome. A total of 779 of these SNPs were successfully assigned to a new chromosome using the previously described method in assigning an orphan SNP to a chromosome.

Mapping round 4
This round consisted of mapping the 779 re-assigned SNPs, followed by one further round of mapping for all SNPs from the 10K SNP panel that had not been placed during mapping round 3. The mapping criteria were same as in mapping round 3.

Mapping round 5: Insertion phase
The remaining unmapped SNPs from the 10K SNP panel after mapping round 4 were inserted into the linkage map at a position where they were neighbouring the SNP with the closest physical position. Initially, the physical positions for SNPs were obtained from the bovine assembly Btau_3. 1 [46]. The final physical positions used in the insertion phase were from the bovine assembly Btau_4.0 [32]. The insertion of SNPs was done from proximal to distal orientation. No attempt was made to study consequences of a SNP insertion in the opposite direction. A SNP was retained in the linkage map if its insertion increased the length of the linkage map by less than 0.5 cM, or the Kosambi distance with the nearest markers was equal or less than 0.5 cM.

Recombination distance per physical distance
Recombination distances and marker physical positions (obtained from bovine genome assembly (Btau 4.0) [32]) were used to estimate recombination distances per physical distances. Pearson correlations were calculated between marker order and their physical positions.