- Research article
- Open Access
High-resolution haplotype block structure in the cattle genome
© Villa-Angulo et al; licensee BioMed Central Ltd. 2009
- Received: 05 June 2008
- Accepted: 24 April 2009
- Published: 24 April 2009
The Bovine HapMap Consortium has generated assay panels to genotype ~30,000 single nucleotide polymorphisms (SNPs) from 501 animals sampled from 19 worldwide taurine and indicine breeds, plus two outgroup species (Anoa and Water Buffalo). Within the larger set of SNPs we targeted 101 high density regions spanning up to 7.6 Mb with an average density of approximately one SNP per 4 kb, and characterized the linkage disequilibrium (LD) and haplotype block structure within individual breeds and groups of breeds in relation to their geographic origin and use.
From the 101 targeted high-density regions on bovine chromosomes 6, 14, and 25, between 57 and 95% of the SNPs were informative in the individual breeds. The regions of high LD extend up to ~100 kb and the size of haplotype blocks ranges between 30 bases and 75 kb (10.3 kb average). On the scale from 1–100 kb the extent of LD and haplotype block structure in cattle has high similarity to humans. The estimation of effective population sizes over the previous 10,000 generations conforms to two main events in cattle history: the initiation of cattle domestication (~12,000 years ago), and the intensification of population isolation and current population bottleneck that breeds have experienced worldwide within the last ~700 years. Haplotype block density correlation, block boundary discordances, and haplotype sharing analyses were consistent in revealing unexpected similarities between some beef and dairy breeds, making them non-differentiable. Clustering techniques permitted grouping of breeds into different clades given their similarities and dissimilarities in genetic structure.
This work presents the first high-resolution analysis of haplotype block structure in worldwide cattle samples. Several novel results were obtained. First, cattle and human share a high similarity in LD and haplotype block structure on the scale of 1–100 kb. Second, unexpected similarities in haplotype block structure between dairy and beef breeds make them non-differentiable. Finally, our findings suggest that ~30,000 uniformly distributed SNPs would be necessary to construct a complete genome LD map in Bos taurus breeds, and ~580,000 SNPs would be necessary to characterize the haplotype block structure across the complete cattle genome.
- Quantitative Trait Locus
- Linkage Disequilibrium
- Effective Population Size
- Haplotype Block
- Shared Haplotype
The rapid improvement in high-throughput single nucleotide polymorphism (SNP) discovery and genotyping technologies is making possible the availability of many thousands of SNP markers for genome-wide association studies [1–5]. High-resolution linkage disequilibrium (LD) maps and characterizations of haplotype block structure are being generated for different organisms, confirming that elucidating in the fine-scale the structure of LD at the population level is crucial for understanding the nature of the highly non-linear association between genes and phenotypic traits, such as complex diseases and quantitative trait loci (QTL) [6–8].
Initial studies in humans [9, 10] demonstrated that, by investigating regions for evidence of recombination and LD patterns, it was possible to parse the human genome into haplotype blocks, and that those blocks shared just a few common haplotypes. This result provided impetus for the construction of LD and haplotype maps of the human genome. Furthermore, haplotype block structure appears to be conserved across mammals .
Recently, high resolution LD and haplotype block maps were generated for humans using a set of 3.1 million SNPs genotyped in 270 individuals from four geographically diverse populations . Overall, 98.6% of the assembled genome is within 5 kb of the nearest polymorphic SNP. The analysis of these high-resolution data is helping to infer with great precision, information about population history, recombination and mutation rates, evidence of positive selection, and is providing invaluable information for gene-disease association studies .
An initial bovine study  reported characterization of haplotype blocks in Holstein-Friesian cattle using a 15 K SNP chip with an average intermarker spacing of 251.8 kb. Another study  reported haplotype block structure for 14 European and African cattle breeds using 1536 SNPs. This study had an average resolution of 311 kb intermarker distance and was focused mainly on chromosome 3. Recently, the Bovine HapMap Consortium  generated an assay of 30 K SNPs and genotyped 501 animals sampled from 19 worldwide taurine (Bos taurus) and indicine (Bos indicus) breeds, plus two outgroup species (Anoa and Water Buffalo). In this article we present the characterization of LD and haplotype block structure across 101 high-density targeted regions from the bovine HapMap data, spanning 7.6 Mb of the genome with an average intermarker distance of ~4 kb. The extent of LD is presented along with the estimation of ancestral population size for different generations. In a first level of analysis, haplotype block characterization allowed us to elucidate the breed-specific block structure and its variability compared with all other breeds. In a second level of analysis, haplotype block density correlation, haplotype block boundary comparison, and haplotype sharing between breeds and subgroups helped us to elucidate high-resolution similarities between breeds, and also permitted us to differentiate breeds by geographic separation versus those related by shared ancestry. Finally, breeds were clustered given computed genetic distances based on haplotype block analysis.
Using the filtered data set (see Methods section for Quality Control filters) from the Bovine HapMap Consortium  consisting of 31,857 markers from 487 animals sampled from 19 cattle breeds (see Additional file 1), we selected the three chromosomes having the highest number of SNP markers, BTA 6, 14, and 25, and performed an analysis of high-density regions on these chromosomes. High density regions were originally genotyped in chromosomes 6 and 14 based on evidence of QTL and chromosome 25 based on a lack of known QTL (see Methods section). The high-density regions were defined as non-overlapping genomic windows of 100 kb containing 10 or more markers and a maximum gap between markers of 20 kb. We identified 101 such high-density regions covering a total genomic distance of 10.1 Mb (see Additional file 2). The effective region (regions within markers) covered is 7.6 Mb and contains in total 1,981 markers with an average of one marker each ~4 kb. The following sections discuss the haplotype block structure of these 101 high-density regions.
SNP allele frequencies across population samples in high-density regions
In general, African and indicine breeds exhibited lower MAF values. It could be thought that this is due to an ascertainment bias in the SNP discovery because all targeted SNPs in this study were originally derived by comparison between a Hereford assembly and sequence reads from a series of bacterial artificial chromosomes (BACs) constructed from Holstein DNA. However, analysis of variation from among the major cattle breeds free from SNP ascertainment bias demonstrated a higher genetic diversity in indicine compared to taurine breeds . In the targeted regions, MAF values ranged from a maximum of 0.253 (Holstein) to 0.116 (Nelore), which is a difference of about 28% in the full scale of 0.0 to 0.5. The average decay in MAF between breeds was 1.51%. (see Additional file 4). Furthermore, we compared the proportion of polymorphic SNPs in the selected regions with the proportion of polymorphic SNPs in the entire HapMap data set and found a 20% higher proportion in the complete HapMap data than the selected regions.
Extent of LD and estimation of effective population size
The 1,981 SNPs in the high-density regions were used to evaluate the extent of pairwise LD as a function of physical distance. The complete set of SNPs (31,857) was used to estimate the effective population size in the previous 10,000 generations for each breed. A pair of haplotypes was inferred for each sample using the software fastPHASE version 1.2.3 , which provided imputed haplotypes for missing genotypes where necessary.
After adjusting r2 for sample size error (see Methods section), we estimated the effective population size over the 10,000 previous generations (assuming a generation time of six to seven years ). This estimation was based on the observation that in a population with constant effective population size N, the approximate expectation of r2 is: , where N is the effective population size 1/(2c) generations in the past, E(r2) is the average of r2 values for all SNPs within a specified range, and c is the median of the range in Morgans (we assumed 1 cM ~1 Mb) [15, 19–22].
Haplotype block structure
Haplotype blocks based on r2 were estimated using the definition from , discussed in the Methods section. Additional file 7 details the block characteristics for all breeds. In summary, the average maximum number of markers per block was 27.16. Across all breeds, 34.7% of the high-density regions were covered by haplotype blocks. We found that mean block size varied from 5.7 to 15.67 kb across breeds (with a mean block size of 10.3 kb over all breeds) and an average of 3.8 markers per block. These results are similar to those in a recent study of human haplotype blocks , which reported haplotype block sizes averaging 7.3, 13.2, and 16.3 kb in three human populations when analysing ten 500-kilobase regions with a density of one SNP per ~5 kb. The human data showed a marked decline in LD over the range of 1–100 kb, again similar to our observed decline in cattle LD from 0.6 to 0.1 over the range 1–100 kb.
From this and the results in the previous section, if we assume that the elucidated average of r2 of ~0.1 in 100 kb, and that the haplotype block average size of ~10 kb with one informative SNP each ~5 kb are homogeneously distributed across the bovine genome, then, for constructing an LD map for association studies we should tag at least a SNP in each 100 kb. Therefore, we can estimate that it would be necessary to successfully assay at least 28,700 SNPs for a LD map for association studies. In the same way, it would be necessary to assay at least 574,000 SNPs to characterize the haplotype block structure across the entire bovine genome (assuming a bovine genome size of 2.87 Gb).
Average haplotype block density correlations from all breeds within the group and outside the group.
Proportions of block boundary discordances and concordances among cattle subgroups
NR – REC (%)
REC – NR (%)
Beef vs Dairy
Beef vs Indicus
Beef vs Composite
Beef vs African
Dairy vs Indicus
Dairy vs Composite
Dairy vs African
Indicus vs Composite
Indicus vd African
Composite vs African
Normalized proportion of shared haplotypes
In this work we present a high-resolution characterization of haplotype block structure in cattle. The analysis was performed on 101 targeted genomic regions spanning 7.6 Mb with an average density of one SNP each ~4 kb, sampled from 19 worldwide breeds. We studied LD and elucidated the block structure for each specific breed. Consistent with previous analyses in cattle, and in high agreement with observation in humans, we observed that LD declines rapidly, such that r2 averages ~0.1 at 100 kb, and haplotype blocks exhibit an overall mean size of 10.3 kb (varying from 5.7 kb to 15.57 kb across all breeds) with an average of 3.8 markers per block. Estimation of effective population size in previous generations reflects the period of domestication ~12,000 years ago, as well as the current population bottleneck that breeds have experienced worldwide (last ~700 years) as a result of population isolation and selective breeding. In addition, an analysis of block density correlations, block boundary discordances, and haplotype sharing across all breeds and between subgroups were consistent in exhibiting a clear differentiation between indicus, African, and composite subgroups, but not between dairy and beef subgroups.
In summary, this work presents the first high-resolution analysis of haplotype block structure in worldwide cattle samples. First, novel results show that cattle and human share a high similarity in LD and haplotype block structure in the scale of 1–100 kb. Second, unexpected similarities in haplotype block structure between dairy and beef breeds make them non-differentiable. Finally, our results suggest that it would be necessary to successfully assay ~30,000 SNPs to construct an LD map for association studies, and ~580,000 SNPs to characterize the haplotype block structure across the entire bovine genome.
Animal samples and data description
The data used for this analysis correspond to the BTA4.0 assembly of the Bovine HapMap consortium database . It includes genotypes from 501 animals on a set of 32,826 markers. Animals were sampled from 19 cattle breeds and two outgroups Anoa and Water Buffalo (see Additional file 1). All breeds belong to the taurus and indicus subspecies of Bos taurus, and represented several different geographical regions: N'Dama and Sheko are African breeds; Angus, Hereford, and Red Angus are British beef breeds; Charolais, Limousin, Piedmontese, and Romagnola are European beef breeds; Guernsey and Jersey are British dairy breeds; Brown Swiss, Holstein, and Norwegian Red are European dairy breeds; Brahman, Nelore, and Gir are indicus breeds; Beefmaster, and Santa Gertrudis are composites of taurine-indicine origin. Individuals were selected to be unrelated at least for 4–5 ancestral generations, with the exception of 44 trios of sire, dam and offspring included to allow quality control of the data and to assist in the determination of allelic phase relationships. The DNA samples were taken from whole blood or cryopreserved semen.
Quality Control filters
To ensure the overall quality of samples and a consistent set of genotypes, QC filters were applied to the initial data (see ). The filters included removal of all genotypes that had >20% missing genotypes, that violated Hardy-Weinberg frequency distribution, or that violated Mendelian inheritance. Data were also removed for all animals with genotype completeness <98%, for markers with estimated genotyping error >5% and at least one breed out of Hardy-Weinberg equilibrium, as well as markers that were monomorphic for all breeds, markers with minor allele frequency <0.05 among all breeds, markers containing >2 discordant trios, and markers assigned to unknown chromosome. After this QC procedure, the data set contained 31,857 markers from 487 animals, and excluded Anoa and Water Buffalo.
In addition to previous QC filters, we removed monomorphic SNPs breed by breed in order to avoid the analysis of uninformative data.
Selection of high-density regions
In order to facilitate the study of haplotypes extended over multiple markers, we focused on the regions of the bovine genome that had the highest density of markers in the HapMap data set. We focused exclusively on chromosomes 6, 14, and 25, which were selected for additional genotyping due to the presence of known QTL of interest in chromosomes 6 and 14, and the absence of known QTL on chromosome 25.
Chromosome 25 therefore served as a control for studies focusing on high-density regions. For this study, we defined high-density regions as non-overlapping genomic windows of 100 kb containing 10 or more markers and a maximum gap between markers of 20 kb. This definition identified 101 high-density regions contained a total of 1,981 markers, yielding an average density of 19.61 markers per region. The average distance between adjacent high-density regions on the same chromosome was 1.46 Mb, but they were not evenly spaced. There were 31 instances in which two adjacent high-density regions were contiguous on the chromosome.
where, p1 and p2 are the minor and major allele frequencies in SNP 1 respectively, q1 and q2 are the minor and major allele frequencies in SNP 2 respectively, and p11 is the frequency of observing both minor alleles in the same individual across all population.
Effective population size estimation
where N is the effective population size 1/(2c) generations in the past, E(r2) is the average of r2 values for all SNPs within a specified range, and c is the median of the range in Morgans [19–22]. To compute N for each breed, the number of previous generations was first selected. Then, c was computed in Morgans and taken as the median of the range (using a range of 10 kb and an approximation of 1 cM ≈ 1 Mb). The adjusted r2 values were averaged for all SNP pairs within the range across all 29 autosomal chromosomes. We estimated N for 10 to 10,000 previous generations by using the complete set of SNPs (31,857 SNPs) since the set comprising just targeted high-density regions only permitted the estimation from N for 5,000 to 10,000 previous generations.
Haplotype block estimation
Haplotype blocks were defined by the following algorithm : (i) Begin a block by selecting the pair of adjacent SNPs with the highest r2 value (no less than α = 0.4); (ii) Repeatedly extend the block if the average r2 value between an adjacent marker and current block members is at least β (= 0.3) and all the pairwise r2 values within the block are at least γ (= 0.1).
where denotes the sample average mean size, s denotes the sample standard deviation, n denotes the sample size, and denotes the percentile of a t distribution with n-1 degrees of freedom .
Comparing Haplotype Block Structure Across Breeds
To determine if the haplotype block structure in high-density regions is conserved among breeds, we counted the number of haplotype blocks occurring in each of the 101 high-density regions for each breed, producing a 101-element vector for each breed.
where i and j represent two breeds, k represents a high density region, xi, kand yj, krepresents the number of haplotype blocks found in region k for breeds i and j respectively, and and represents the mean number of haplotype blocks found across all regions for breeds i and j respectively.
In order to assess the consistency of block boundaries across breeds, we examined adjacent pairs of SNPs with intermarker distances up to 10 kb. For each breed, it was determined whether the pair was assigned to a single block or not. Then, for a given pair of breeds, a SNP pair was termed concordant if the assignment was the same in both breeds and discordant if the assignments disagreed . We performed this analysis for all pairs of breeds. In addition, we computed concordances and discordances between beef and dairy g roups, and between dairy and indicus groups as well.
S'(P1, P2, k) has value 1.0 if the proportional of shared haplotypes between populations P1 and P2 at locus k is equal to the average of the proportional of shared haplotypes within the two populations P1 and P2. If S'(P1, P2, k) << 1.0, then the proportion of shared haplotypes between the two populations is much less than the average within the two populations.
Clustering based on Shared Haplotypes
where u is the number of loci. This is related to common measurements for genetic distance between two individuals [28–30]. D'(P1, P2) has value 0 if breeds P1 and P2 share the same proportion of haplotypes as are shared by the individuals within each individual breed.
Clustering based on Principal Components Analysis
Vectors resulting from the computation of haplotype block boundary discordances for each breed compared to the remaining breeds were used to perform a Principal Component Analysis (PCA) and look for differentiation between cattle subgroups. We used R software to perform this analysis. The central idea of PCA is to reduce the dimensionality of a data set which consists of a large number of interrelated variables, while retaining as much as possible of the variation present in the data set. This is achieved by transforming a new set of variables, the principal components (PCs), which are uncorrelated, and which are ordered so that the first few retain most of the variation present in all the original variables .
Formally, PCA is defined as an orthogonal linear transformation that transforms the data to a new coordinate system such that the greatest variance by any projection of the data comes to lie on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on. PCA is theoretically the optimum transform for a given data in least square terms. The procedure for obtaining PCAs can be summarized as follows:
Select d eigenvectors to represent the n variables, d <n. Then the P1, P2,..., P d are called the principal components.
This project was supported by National Research Initiative Grant no. 2007-35604-17870 from the USDA Cooperative State Research, Education, and Extension Service Animal Genome program. RV was supported in part by a Fulbright Scholarship.
- Craig DW, Stephan DA: Applications of whole-genome high-density SNP genotyping. Expert review of molecular diagnostics. 2005, 5 (2): 159-170. 10.1586/1473718.104.22.168.View ArticlePubMedGoogle Scholar
- Hyten DL, Song Q, Choi IY, Yoon MS, Specht JE, Matukumalli LK, Nelson RL, Shoemaker RC, Young ND, Cregan PB: High-throughput genotyping with the GoldenGate assay in the complex genome of soybean. TAG Theoretical and applied genetics. 2008, 116: 945-52. 10.1007/s00122-008-0726-2.View ArticlePubMedGoogle Scholar
- Matukumalli LK, Grefenstette JJ, Hyten DL, Choi IY, Cregan PB, Van Tassell CP: Application of machine learning in SNP discovery. BMC Bioinformatics. 2006, 7: 4-10.1186/1471-2105-7-4.PubMed CentralView ArticlePubMedGoogle Scholar
- Matukumalli LK, Grefenstette JJ, Hyten DL, Choi IY, Cregan PB, Van Tassell CP: SNP-PHAGE – High throughput SNP discovery pipeline. BMC Bioinformatics. 2006, 7: 468-10.1186/1471-2105-7-468.PubMed CentralView ArticlePubMedGoogle Scholar
- Van Tassell CP, Smith TP, Matukumalli LK, Taylor JF, Schnabel RD, Lawley CT, Haudenschild CD, Moore SS, Warren WC, Sonstegard TS: SNP discovery and allele frequency estimation by deep sequencing of reduced representation libraries. Nat Methods. 2008, 5 (3): 247-252. 10.1038/nmeth.1185.View ArticlePubMedGoogle Scholar
- Jakobsson M, Scholz SW, Scheet P, Gibbs JR, VanLiere JM, Fung HC, Szpiech ZA, Degnan JH, Wang K, Guerreiro R, et al: Genotype, haplotype and copy-number variation in worldwide human populations. Nature. 2008, 451 (7181): 998-1003. 10.1038/nature06742.View ArticlePubMedGoogle Scholar
- McKay SD, Schnabel RD, Murdoch BM, Matukumalli LK, Aerts J, Coppieters W, Crews D, Dias Neto E, Gill CA, Gao C, et al: Whole genome linkage disequilibrium maps in cattle. BMC Genet. 2007, 8: 74-10.1186/1471-2156-8-74.PubMed CentralView ArticlePubMedGoogle Scholar
- Wang X, Korstanje R, Higgins D, Paigen B: Haplotype analysis in multiple crosses to identify a QTL gene. Genome Res. 2004, 14 (9): 1767-1772. 10.1101/gr.2668204.PubMed CentralView ArticlePubMedGoogle Scholar
- Daly MJ, Rioux JD, Schaffner SF, Hudson TJ, Lander ES: High-resolution haplotype structure in the human genome. Nat Genet. 2001, 29 (2): 229-232. 10.1038/ng1001-229.View ArticlePubMedGoogle Scholar
- Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B, Higgins J, DeFelice M, Lochner A, Faggart M, et al: The structure of haplotype blocks in the human genome. Science. 2002, 296 (5576): 2225-2229. 10.1126/science.1069424.View ArticlePubMedGoogle Scholar
- Guryev V, Smits BM, Belt van de J, Verheul M, Hubner N, Cuppen E: Haplotype block structure is conserved across mammals. PLoS Genet. 2006, 2 (7): e121-10.1371/journal.pgen.0020121.PubMed CentralView ArticlePubMedGoogle Scholar
- Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, Belmont JW, Boudreau A, Hardenbol P, Leal SM, et al: A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007, 449 (7164): 851-861. 10.1038/nature06258.View ArticlePubMedGoogle Scholar
- Sabeti PC, Varilly P, Fry B, Lohmueller J, Hostetter E, Cotsapas C, Xie X, Byrne EH, McCarroll SA, Gaudet R, et al: Genome-wide detection and characterization of positive selection in human populations. Nature. 2007, 449 (7164): 913-918. 10.1038/nature06250.PubMed CentralView ArticlePubMedGoogle Scholar
- Khatkar MS, Zenger KR, Hobbs M, Hawken RJ, Cavanagh JA, Barris W, McClintock AE, McClintock S, Thomson PC, Tier B, et al: A primary assembly of a bovine haplotype block map based on a 15,036-single-nucleotide polymorphism panel genotyped in holstein-friesian cattle. Genetics. 2007, 176 (2): 763-772. 10.1534/genetics.106.069369.PubMed CentralView ArticlePubMedGoogle Scholar
- Gautier M, Faraut T, Moazami-Goudarzi K, Navratil V, Foglio M, Grohs C, Boland A, Garnier JG, Boichard D, Lathrop GM, et al: Genetic and haplotypic structure in 14 European and African cattle breeds. Genetics. 2007, 177 (2): 1059-1070. 10.1534/genetics.107.075804.PubMed CentralView ArticlePubMedGoogle Scholar
- The Bovine HapMap Consortium: Genome-Wide Survey of SNP Variation Uncovers the Genetic Structure of Cattle Breeds. Science. 2009, 324 (5926): 528-532. 10.1126/science.1167936.PubMed CentralView ArticleGoogle Scholar
- Scheet P, Stephens M: A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet. 2006, 78 (4): 629-644. 10.1086/502802.PubMed CentralView ArticlePubMedGoogle Scholar
- Nilsen H, Hayes B, Berg PR, Roseth A, Sundsaasen KK, Nilsen K, Lien S: Construction of a dense SNP map for bovine chromosome 6 to assist the assembly of the bovine genome sequence. Anim Genet. 2008, 39: 97-104. 10.1111/j.1365-2052.2007.01686.x.View ArticlePubMedGoogle Scholar
- Tenesa A, Navarro P, Hayes BJ, Duffy DL, Clarke GM, Goddard ME, Visscher PM: Recent human effective population size estimated from linkage disequilibrium. Genome Res. 2007, 17 (4): 520-526. 10.1101/gr.6023607.PubMed CentralView ArticlePubMedGoogle Scholar
- Hayes BJ, Visscher PM, McPartlan HC, Goddard ME: Novel multilocus measure of linkage disequilibrium to estimate past effective population size. Genome Res. 2003, 13 (4): 635-643. 10.1101/gr.387103.PubMed CentralView ArticlePubMedGoogle Scholar
- Hayes BJ, Lien S, Nilsen H, Olsen HG, Berg P, Maceachern S, Potter S, Meuwissen TH: The origin of selection signatures on bovine chromosome 6. Anim Genet. 2008, 39: 105-11. 10.1111/j.1365-2052.2007.01683.x.View ArticlePubMedGoogle Scholar
- Sved JA: Linkage disequilibrium and homozygosity of chromosome segments in finite populations. Theoretical population biology. 1971, 2 (2): 125-141. 10.1016/0040-5809(71)90011-6.View ArticlePubMedGoogle Scholar
- Kershaw I: The Great Famine and Agrarian Crisis in England 1315–1322. Past and Present. 1973, 59 (1): 3-50. 10.1093/past/59.1.3.View ArticleGoogle Scholar
- Gu S, Pakstis AJ, Kidd KK: HAPLOT: a graphical comparison of haplotype blocks, tagSNP sets and SNP variation for multiple populations. Bioinformatics. 2005, 21 (20): 3938-3939. 10.1093/bioinformatics/bti649.View ArticlePubMedGoogle Scholar
- International HapMap Consortium: A haplotype map of the human genome. Nature. 2005, 437 (7063): 1299-1320. 10.1038/nature04226.View ArticleGoogle Scholar
- Jolliffe IT: Principal Component Analysis. 2002, Springer, 2Google Scholar
- Rosner B: Fundamentals of Biostatistics. 2006, Thomson Brooks/Cole, SixthGoogle Scholar
- Bowcock AM, Ruiz-Linares A, Tomfohrde J, Minch E, Kidd JR, Cavalli-Sforza LL: High resolution of human evolutionary trees with polymorphic microsatellites. Nature. 1994, 368 (6470): 455-457. 10.1038/368455a0.View ArticlePubMedGoogle Scholar
- Mountain JL, Cavalli-Sforza LL: Multilocus genotypes, a tree of individuals, and human evolutionary history. Am J Hum Genet. 1997, 61 (3): 705-718. 10.1086/515510.PubMed CentralView ArticlePubMedGoogle Scholar
- Witherspoon DJ, Wooding S, Rogers AR, Marchani EE, Watkins WS, Batzer MA, Jorde LB: Genetic similarities within and between human populations. Genetics. 2007, 176 (1): 351-359. 10.1534/genetics.106.067355.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.