An assessment of population structure in eight breeds of cattle using a whole genome SNP panel
- Stephanie D McKay1, 2,
- Robert D Schnabel2,
- Brenda M Murdoch1,
- Lakshmi K Matukumalli3, 4,
- Jan Aerts5,
- Wouter Coppieters6,
- Denny Crews1, 7,
- Emmanuel Dias Neto8, 9,
- Clare A Gill10,
- Chuan Gao10,
- Hideyuki Mannen11,
- Zhiquan Wang1,
- Curt P Van Tassell3,
- John L Williams12,
- Jeremy F Taylor2 and
- Stephen S Moore1Email author
© McKay et al; licensee BioMed Central Ltd. 2008
Received: 10 September 2007
Accepted: 20 May 2008
Published: 20 May 2008
Analyses of population structure and breed diversity have provided insight into the origin and evolution of cattle. Previously, these studies have used a low density of microsatellite markers, however, with the large number of single nucleotide polymorphism markers that are now available, it is possible to perform genome wide population genetic analyses in cattle. In this study, we used a high-density panel of SNP markers to examine population structure and diversity among eight cattle breeds sampled from Bos indicus and Bos taurus.
Two thousand six hundred and forty one single nucleotide polymorphisms (SNPs) spanning all of the bovine autosomal genome were genotyped in Angus, Brahman, Charolais, Dutch Black and White Dairy, Holstein, Japanese Black, Limousin and Nelore cattle. Population structure was examined using the linkage model in the program STRUCTURE and Fst estimates were used to construct a neighbor-joining tree to represent the phylogenetic relationship among these breeds.
The whole-genome SNP panel identified several levels of population substructure in the set of examined cattle breeds. The greatest level of genetic differentiation was detected between the Bos taurus and Bos indicus breeds. When the Bos indicus breeds were excluded from the analysis, genetic differences among beef versus dairy and European versus Asian breeds were detected among the Bos taurus breeds. Exploration of the number of SNP loci required to differentiate between breeds showed that for 100 SNP loci, individuals could only be correctly clustered into breeds 50% of the time, thus a large number of SNP markers are required to replace the 30 microsatellite markers that are currently commonly used in genetic diversity studies.
Population structure and diversity within and between breeds of cattle have been studied to learn more about the origin, history and evolution of cattle [1–3]. Diversity studies and subsequent investigations concerning domestication events of Bos taurus and Bos indicus cattle have included sequencing from the displacement loop of mitochondrial DNA (mtDNA). Bradley et al.  used mtDNA sequence variation in 90 extant bovines from Africa, Europe and India to identify patterns of genetic variation consistent with the demographics of the domestication process. When nuclear marker have been used to study diversity in cattle, they have principally entailed microsatellite markers . MacHugh et al.  used 20 microsatellites to help clarify the genetic relationships between cattle populations from Africa, Europe and Asia and provided support for a separate origin of domestication for Bos taurus and Bos indicus cattle. Analysis of allelic variation has been used to characterize the genetic relationships between breeds [4–7]. Kumar et al.  used 20 microsatellite markers to estimate the extent of genetic differentiation among breeds of cattle from India, Europe and the Near East. Assuming two ancestral populations, the mean admixture coefficients ranged from 0.0 to 0.1 in Indian Bos indicus breeds, 0.9 to 1.0 in European Bos taurus breeds and from 0.1 to 0.9 in hybrid breeds from the Near East. This variation in admixture coefficients reflects the ancestral divergence between the Bos taurus and Bos indicus subspecies. Similarly, Wiener et al.  characterized the diversity within and between eight British breeds of cattle using 30 microsatellite markers and found that the majority of the allelic variation (87%) was found within breeds. In addition, the studied breeds of cattle did not cluster according to their current geographic location, suggesting that the genetic origin of breeds was from different geographical regions. In a study of the origin of Chirikof Island cattle, MacNeil et al.  also found that 86% of the genetic variation in 34 microsatellite loci was found within Bos taurus breeds while the remaining 14% of genetic variation was found between breeds. However, the indigenous Chirikof Island cattle were strongly differentiated from the European Bos taurus cattle suggesting that a comparison between Asian Bos taurus breeds might next be appropriate. On the other hand, no significant divergence appears to exist between geographically separated populations of Holstein cattle probably due to historic occurrences of gene flow between populations and selection for similar traits . Up to now most studies have focused on a small set of microsatellite loci, typically the 30 suggested by the FAO . The true extent of autosomal diversity among cattle breeds has yet to be extensively explored. Here, we examine population substructure and interbreed diversity among eight breeds of cattle using 2,641 autosomal genome-wide SNPs.
Results and Discussion
Analysis of Molecular Variance.
Variance Components (%)
# groups (K)
Among populations within groups
All 8 breeds
0.036 ± 0.002
0.000 ± 0.000
0.000 ± 0.000
only Bos taurus
0.168 ± 0.003
0.000 ± 0.000
0.000 ± 0.000
Bos taurus without Japanese Black
0.101 ± 0.003
0.000 ± 0.000
0.000 ± 0.000
To explore the population structure among the taurine breeds, a second STRUCTURE analysis was performed removing the two indicine breeds, and using data from the six taurine breeds (Figure 1b). This analysis identified Japanese Black cattle as being distinct from the cluster comprising the remaining five taurine breeds (Figure 2b). However, the partitioning was not strongly supported by the analysis of molecular variance (Fct = 0.09; P < 0.17; Table 1). The mean admixture coefficients for the European taurine breeds ranged from 0.43 to 0.60 while values for the Japanese Black ranged from 0.1 to 0.29. The upper and lower quartile range of the admixture coefficients for the individual Japanese Black animals were not as symmetric as found for the European taurine breeds (Figure 2b) and were skewed towards the European taurine breeds, suggesting a recent influence of European Bos taurus breeds within Japanese Black. Previously published reports describe the use of European breeds to upgrade Japanese Black cattle  which is supported by these data. Several domestication events have been suggested for cattle involving different strains of aurochs, including an independent taurine domestication event in Asia [12, 13]. These results suggest that the Japanese Black breed is genetically distinct from the European taurine breeds and because the divergence greatly exceeds the variation between the beef and dairy breeds (Figure 2b), we believe that an independent Asian domestication event is more likely to explain the divergence than does selection or drift following domestication. The within breed variation in the admixture coefficient Q in Figure 2 also supports this contention. Provided the Japanese Black population does not represent a recent cross among divergent populations, the increased variation within this population is consistent with the hypothesis of a local Asian domestication event. Additional Asian derived cattle breeds will need to be tested to assess the weight of evidence for this hypothesis. However, our data are completely consistent with the origin of Japanese Black cattle being from an independent Asian domestication.
The third STRUCTURE analysis considered the remaining Bos taurus breeds after excluding the Japanese Black and resulted in a clustering of the meat and dairy breeds (Figures 1c, 2c). The mean admixture coefficients demonstrate considerably less variation within the Continental European breeds, which is consistent with the small effective population size that must have accompanied the introduction into North America of small samples of animals from these Continental breeds. The strong selection for milk production in the Holstein breed in conjunction with the extensive use of artificial insemination has reduced the genetic diversity within this breed and is apparent in these data. Surprisingly, therefore, the Dutch Black and White cattle had the greatest variation among all of the breeds studied suggesting that selection for milk production has been less intense in this breed than in Holsteins. Interestingly, 4.65% of the variation was found between the beef and dairy groups (FCT = 0.04) (Table 1) with a p value of 0.10 that was suggestive, but not significant. This variation suggests that artificial selection within cattle for alternate agricultural purposes has led to a genome wide divergence among the beef and dairy breeds. Additional analyses in which the genomic regions at which divergence between the types is greatest are overlaid with detected meat and milk QTL would be of considerable interest.
The recent completion of a draft bovine genome sequence assembly has provided sufficient numbers of SNP loci to replace microsatellite loci and augment mtDNA sequences for population genetic analyses in cattle. We have shown that SNP loci can be used to identify population substructure among cattle breeds. However, we have demonstrated that a large number of SNP loci must be used to obtain an equivalent degree of precision in estimates of diversity compared with microsatellite loci, due to the lower information content of individual SNP loci. At issue is the importance of ascertainment of these loci to the phylogenies that are constructed. Because the majority of available SNPs were detected as the most common SNP within Bos taurus breeds, certain biases must exist within the analyses. However, the extent of these biases can only be quantified when these analyses are repeated using unbiased samples of loci, which to date, do not exist.
DNA was collected from the following Bos taurus breeds: 70 Angus (USA), 20 Canadian Angus, 40 Charolais (Canada), 97 Dutch Black and White dairy cattle (Belgium), 48 Holstein (USA), 65 Japanese Black (Japan) and 43 Limousin (USA). Additionally, DNA was collected from two Bos indicus breeds: 40 Brahman (USA) and 97 Nelore (Brazil). Family structure and the number of individuals per family varied between breeds but the general family structure consisted of a grandparent, parent and three or more progeny. To determine the phase of alleles on the chromosomes using linkage information, we selected small families where members within the families were closely related but the families themselves were as unrelated as possible. This three generation family structure allowed for the efficient estimation of marker phase relationships in the progeny and also produced the most likely phase relationships in each of the parents and grandparents.
Marker Selection and Genotyping
A detailed description of the SNP loci used in this study and of the genotyping methods was presented in McKay et al. . Briefly, sequence information for SNPs (see Additional file 1) was obtained from public databases and SNPs were genotyped as a GoldenGate® assay using an Illumina BeadStation 500 G http://www.illumina.com. Loci included in this study met the following criteria; minor allele frequency (MAF) ≥ 0.05 in Angus based on previous screens (data not shown) and concordant locus order between radiation hybrid (RH) maps  and genomic sequence location. The software GENOPROB V2.0 [19, 20] was used to assess genotype score quality and to produce whole chromosome phased maternal and paternal haplotypes based on the pedigree and map locations of the loci.
Population Structure Analysis
STRUCTURE and the linkage model of Falush et al.  were used to evaluate the extent of substructure among contemporary breeds of European and Asian Bos taurus and Bos indicus cattle. Exploratory STRUCTURE runs were used to determine the optimum number of iterations for the initial burn-in and estimation phases of the analysis to ensure that the Gibbs sampler had explored a sufficiently large sample space to provide reliable posterior probabilities. From these preliminary analyses, we determined that an initial burn-in of 10,000 iterations followed by 10,000 iterations for parameter estimation was sufficient to ensure convergence of parameter estimates (data not shown). We performed a series of analyses (runs) that were based on inclusion of differing combinations of cattle breeds in an attempt to determine the minimum number of ancestral populations that were admixed to best explain the genomic architecture of the current set of breeds. The first run used all of the animals from all 8 breeds. The second run used the 6 taurine breeds (Angus, Charolais, Limousin, Dutch Black and White Dairy, Holstein and Japanese Black) while the third run used the taurine breeds without the Japanese Black. To estimate the number of populations (the K parameter of STRUCTURE), each of these three data sets was analyzed allowing the value of K to vary from 1 to 5 and each run was repeated five times to produce a total of 75 STRUCTURE runs. Using the method of Evanno et al.  we calculated ΔK which is an ad hoc quantity related to the second order rate of change of the log probability (likelihood) of the data Pr(X|K) (equation 12 in ) with respect to the number of population clusters K.
Assuming the full dataset of 2,641 loci would yield the most accurate estimate of the true number of ancestral populations, we sought to determine the effect of the number of loci analyzed on inferences of K. Three new data sets each with 10 replicates were created by randomly sampling 25, 50, 100 or 150 loci from the 2,641 markers. Each replicate was analyzed using STRUCTURE as previously described except the admixture model was used rather than the linkage model as linkage among the sampled loci was assumed to be lost due to randomly sampling loci throughout the genome.
Pairwise Fst values based on 2,641 SNP loci.
The authors thank Michel Georges for intellectual contributions. This work was supported through Grant Number 2003A245R awarded to S.S. Moore by the Alberta Agriculture Research Institute. JFT and RDS were supported by National Research Initiative grants 2005-35205-15448, 2005-35604-15615, 2006-35205-16701 and 2006-35616-16697 from the United States Department of Agriculture Cooperative State Research, Education and Extension Service. J. Aerts was supported by the Biotechnology and Biological Sciences Research Council (BBSRC) grant BBS/B13454 "Bovine genome annotation and analysis".
- Bradley DG, MacHugh DE, Cunningham P, Loftus RT: Mitochondrial diversity and the origins of African and European cattle. Proc Natl Acad Sci U S A. 1996, 93 (10): 5131-5135. 10.1073/pnas.93.10.5131.PubMed CentralView ArticlePubMed
- MacHugh DE, Shriver MD, Loftus RT, Cunningham P, Bradley DG: Microsatellite DNA variation and the evolution, domestication and phylogeography of taurine and zebu cattle (Bos taurus and Bos indicus). Genetics. 1997, 146 (3): 1071-1086.PubMed CentralPubMed
- Troy CS, MacHugh DE, Bailey JF, Magee DA, Loftus RT, Cunningham P, Chamberlain AT, Sykes BC, Bradley DG: Genetic evidence for Near-Eastern origins of European cattle. Nature. 2001, 410 (6832): 1088-1091. 10.1038/35074088.View ArticlePubMed
- Kumar P, Freeman AR, Loftus RT, Gaillard C, Fuller DQ, Bradley DG: Admixture analysis of South Asian cattle. Heredity. 2003, 91 (1): 43-50. 10.1038/sj.hdy.6800277.View ArticlePubMed
- Wiener P, Burton D, Williams JL: Breed relationships and definition in British cattle: a genetic analysis. Heredity. 2004, 93 (6): 597-602. 10.1038/sj.hdy.6800566.View ArticlePubMed
- Macneil MD, Cronin MA, Blackburn HD, Richards CM, Lockwood DR, Alexander LJ: Genetic relationships between feral cattle from Chirikof Island, Alaska and other breeds. Anim Genet. 2007, 38 (3): 193-197. 10.1111/j.1365-2052.2007.01559.x.View ArticlePubMed
- Negrini R, Nijman IJ, Milanesi E, Moazami-Goudarzi K, Williams JL, Erhardt G, Dunner S, Rodellar C, Valentini A, Bradley DG, Olsaker I, Kantanen J, Ajmone-Marsan P, Lenstra JA: Differentiation of European cattle by AFLP fingerprinting. Anim Genet. 2007, 38 (1): 60-66. 10.1111/j.1365-2052.2007.01554.x.View ArticlePubMed
- Zenger KR, Khatkar MS, Cavanagh JA, Hawken RJ, Raadsma HW: Genome-wide genetic diversity of Holstein Friesian cattle reveals new insights into Australian and global population variability, including impact of selection. Anim Genet. 2007, 38 (1): 7-14. 10.1111/j.1365-2052.2006.01543.x.View ArticlePubMed
- CaDBase. [http://www.projects.roslin.ac.uk/cdiv/markers.html]
- Evanno G, Regnaut S, Goudet J: Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Mol Ecol. 2005, 14 (8): 2611-2620. 10.1111/j.1365-294X.2005.02553.x.View ArticlePubMed
- Vignal A, Milan D, SanCristobal M, Eggen A: A review on SNP and other types of molecular markers and their use in animal genetics. Genet Sel Evol. 2002, 34 (3): 275-305. 10.1051/gse:2002009.PubMed CentralView ArticlePubMed
- Mannen H, Tsuji S, Loftus RT, Bradley DG: Mitochondrial DNA variation and evolution of Japanese black cattle (Bos taurus). Genetics. 1998, 150 (3): 1169-1175.PubMed CentralPubMed
- Mannen H, Kohno M, Nagata Y, Tsuji S, Bradley DG, Yeo JS, Nyamsamba D, Zagdsuren Y, Yokohama M, Nomura K, Amano T: Independent mitochondrial origin and historical genetic differentiation in North Eastern Asian cattle. Mol Phylogenet Evol. 2004, 32 (2): 539-544. 10.1016/j.ympev.2004.01.010.View ArticlePubMed
- Kruglyak L: The use of a genetic map of biallelic markers in linkage studies. Nat Genet. 1997, 17 (1): 21-24. 10.1038/ng0997-21.View ArticlePubMed
- Seldin MF, Shigeta R, Villoslada P, Selmi C, Tuomilehto J, Silva G, Belmont JW, Klareskog L, Gregersen PK: European population substructure: clustering of northern and southern populations. PLoS Genet. 2006, 2 (9): e143-10.1371/journal.pgen.0020143.PubMed CentralView ArticlePubMed
- McKay SD, Schnabel RD, Murdoch BM, Matukumalli LK, Aerts J, Coppieters W, Crews D, Dias Neto E, Gill CA, Gao C, Mannen H, Stothard P, Wang Z, Van Tassell CP, Williams JL, Taylor JF, Moore SS: Whole genome linkage disequilibrium maps in cattle. BMC Genet. 2007, 8: 74-10.1186/1471-2156-8-74.PubMed CentralView ArticlePubMed
- Oliphant A, Barker DL, Stuelpnagel JR, Chee MS: BeadArray technology: enabling an accurate, cost-effective approach to high-throughput genotyping. Biotechniques. 2002, Suppl: 56-8, 60-1.PubMed
- McKay SD, Schnabel RD, Murdoch BM, Aerts J, Gill CA, Gao C, Li C, Matukumalli LK, Stothard P, Wang Z, Van Tassell CP, Williams JL, Taylor JF, Moore SS: Construction of bovine whole-genome radiation hybrid and linkage maps using high-throughput genotyping. Anim Genet. 2007, 38 (2): 120-125. 10.1111/j.1365-2052.2006.01564.x.PubMed CentralView ArticlePubMed
- Thallman RM, Bennett GL, Keele JW, Kappes SM: Efficient computation of genotype probabilities for loci with many alleles: I. Allelic peeling. J Anim Sci. 2001, 79 (1): 26-33.PubMed
- Thallman RM, Bennett GL, Keele JW, Kappes SM: Efficient computation of genotype probabilities for loci with many alleles: II. Iterative method for large, complex pedigrees. J Anim Sci. 2001, 79 (1): 34-44.PubMed
- Falush D, Stephens M, Pritchard JK: Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics. 2003, 164 (4): 1567-1587.PubMed CentralPubMed
- Pritchard JK, Stephens M, Donnelly P: Inference of population structure using multilocus genotype data. Genetics. 2000, 155 (2): 945-959.PubMed CentralPubMed
- Schneider S, Roessli D, Excoffier L: Arlequin: A software for population genetics data analysis. 2000, Genetics and Biometry Lab, Dept. of Anthropology, University of Geneva , Ver 2.000
- Kumar S, Tamura K, Nei M: MEGA3: Integrated software for Molecular Evolutionary Genetics Analysis and sequence alignment. Brief Bioinform. 2004, 5 (2): 150-163. 10.1093/bib/5.2.150.View ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.