Skip to main content

Microsatellite-based genetic diversity and population structure of domestic sheep in northern Eurasia



Identification of global livestock diversity hotspots and their importance in diversity maintenance is essential for making global conservation efforts. We screened 52 sheep breeds from the Eurasian subcontinent with 20 microsatellite markers. By estimating and weighting differently within- and between-breed genetic variation our aims were to identify genetic diversity hotspots and prioritize the importance of each breed for conservation, respectively. In addition we estimated how important within-species diversity hotspots are in livestock conservation.


Bayesian clustering analysis revealed three genetic clusters, termed Nordic, Composite and Fat-tailed. Southern breeds from close to the region of sheep domestication were more variable, but less genetically differentiated compared with more northern populations. Decreasing weight for within-breed diversity component led to very high representation of genetic clusters or regions containing more diverged breeds, but did not increase phenotypic diversity among the high ranked breeds. Sampling populations throughout 14 regional groups was suggested for maximized total genetic diversity.


During initial steps of establishing a livestock conservation program populations from the diversity hot-spot area are the most important ones, but for the full design our results suggested that approximately equal population presentation across environments should be considered. Even in this case, higher per population emphasis in areas of high diversity is appropriate. The analysis was based on neutral data, but we have no reason to think the general trend is limited to this type of data. However, a comprehensive valuation of populations should balance production systems, phenotypic traits and available genetic information, and include consideration of probability of success.


The domestic sheep (Ovis aries) has been an economically and culturally important farm animal species since its domestication in the Near East approximately 9,000 years B.P. [1]. A northern Eurasian sheep stock formed some 6,000 years ago as sheep were brought to the British Isles, northern Europe and Russia after the expansion to the European continent via Danubian and Mediterranean routes [2], and a possible route through Russia [3]. Sheep dispersed across Europe in temporally separate migratory episodes: the most original and a more primitive type of domestic sheep was later replaced by a more developed wool type of sheep. Ancestry from the first immigrant wave seems to have survived only in north-western and northern peripheries of Europe [4].

A similar replacement process is occurring in modern days. Global standardization of production environments and breed competition have led to the disappearance of many native breeds. Food and Agriculture Organization of the United Nations (FAO) has estimated that 36% of the sheep breeds of known census size are either extinct or endangered [5]. Furthermore, the use of a few high-quality males for intense mating has resulted in the reduction of effective population size (Ne) over time and reduced genetic diversity within breeds [6]. These processes will lead to the decrease of effective population size of the entire species. This could restrict breeding options and genetic gain of breeding programs to the extent that unpredictable future requirements might not be met [68]. Breed conservation aims to maintain these options, but limited resources, e.g. financial limitation, might not allow conservation of all the breeds.

One can argue that the breeds originating from or close to the domestication centers, such as the Near Eastern region, should be particularly prioritized in conservation programs. Microsatellite studies in cattle (Bos taurus) [911], goat (Capra hircus) [12] and sheep (Ovis aries) [13] suggested that the breeds located close to the putative domestication centers are the most variable. These breeds might possess allelic variations retained from the wild ancestors that never reached areas further from the center of origin. Although one cannot easily differentiate these primary diversity hotspots from the secondary hotspots created by a more recent crossbreeding, continent-wide mapping of the regions of exceptional livestock diversity (genetic diversity hotspots) has been suggested as a means of targeting conservation efforts for livestock species [10, 14]. DNA marker data can be used to calculate molecular coancestries within and between breeds and determine contributions of each breed to a pool of animals that would maximize genetic diversity of the pool, i.e. minimize average molecular coancestry [15]. These calculations can provide critical information when the prioritization of breeds needs to be done for conservation of diversity of domestic animal species. Using this conservation approach, it would be possible to maximize Ne of the subdivided species and thus minimize the depleting effect of genetic drift on genetic variation.

There have been a few quite comprehensive gene diversity studies in sheep [13, 1618]. However, none of these focused on breed prioritization to describe general trends in the conservation of genetic diversity in sheep. Though genome-wide Single Nucleotide Polymorphisms (SNP) data are becoming the standard for livestock genetics [18], they can have a problem of ascertainment bias originating from SNP discovery protocols [19]. Though the problem can be alleviated through using haplotypic measures [20] or through bias corrections [21], the established baseline trend using low bias markers such as microsatellites remains an important benchmark. We used a representative set of sheep types across the Northern Eurasia to explore the diversity patterns and inferred conservation priorities based on microsatellites. For breed ranking we applied the method based on the minimization of molecular coancestry in a subdivided population by Caballero and Toro [15]. We tested the effect by weighting differently the two components of maximum genetic diversity, within-breed and between-breed variation, when doing priority settings of breeds. Based on the common statement that populations from diversity hotspot regions are more important [9, 14, 10], we expected large number of breeds from a hotspot region to be highly prioritized.


Genetic diversity

In total, 342 alleles were detected at the 20 microsatellite loci analyzed (Additional file 1: Table S1). A summary of the genetic diversity parameters computed for 16 regional groups is presented in Table 1 and the breed-wise values based, on an average, on 32 sheep per breed are given in Additional file 2: Table S2. The total genetic diversity (HT ) varied from 0.651 to 0.807 in the Danish and the Ukrainian regional groups, respectively. The area having regional groups with HT values above 0.8 (Ukraine, southeast Europe, Kazakhstan and east of the Caspian Sea, Buryatia and the southern Caucasus), was termed a diversity hotspot. Among breeds, the unbiased expected heterozygosity (HS) ranged from 0.613 (the Norwegian Cheviot) to 0.806 (the Russian Karakul), with an average value of 0.759. Allelic richness varied in the similar pattern as other within-population diversity measures (e.g. HT and HS ) across the breeds (Table 1). The overall estimate of f[22] was 0.011. The breed-wise f estimates were significantly (P < 0.05) greater than zero only for the Norwegian Rygja Sheep and the Swedish Rya Sheep suggesting that most breeds are quite uniform (Additional file 2: Table S2).

Table 1 Genetic diversity within 16 regional groups

Genetic cluster analysis

A model-based clustering was applied to resolve the population genetic structure. At K = 3, one cluster was constituted by the breeds descending mainly from the northernmost edge of the studied distribution (termed the 'Nordic cluster'), while the fat-tailed breeds, originating mainly from the Caucasus and Caspian basin areas, geographically close to the Near Eastern domestication center, formed the second cluster (termed the 'Fat-tailed cluster'). A third cluster mainly contained the composite sheep breeds from central Eurasia (termed the 'Composite cluster') (Figure 1). The mean similarity coefficient (SC) across 10 runs was 0.984 at K = 3. At K = 4, a split within the Nordic cluster was observed, but the drop of SC to 0.534 indicated variable assignments for breeds across runs and lack of additional strong high-level substructure among the populations. Therefore, separating the entire dataset into three clusters was chosen as the final global configuration.

Figure 1

Clustering of 52 sheep breeds. Individuals are presented as vertical lines divided into K colors, representing constructed populations. The lowest row represents further clustering of 3 groups, identified at K = 3, separately. The Nordic group is divided into 7 subclusters, while the Composite (in the middle) and the Fat-tailed groups each split into 3 subclusters.

To dissect the genetic structure within the three clusters, STRUCTURE analysis was further applied to each of them separately. In the Nordic cluster, the most consistent grouping of 11 north European sheep breeds was achieved at K = 7 (SC = 0.641), with the mean SC ranging from 0.250 to 0.314 at K other than seven. Breeds originating from the same country (e.g. Finnsheep and Finnish Grey Landrace) or from the neighboring regions (e.g. the Icelandic Sheep and the Faeroe Island Sheep) tended to cluster together (Figure 1).

The Fat-tailed sheep cluster was composed mainly of the coarse-wool native breeds from the Caucasus and steppes of the Caspian basin and Kazakhstan. Surprisingly, the northern short-tailed coarse-wool Romanov Sheep was also assigned into this cluster. The breed's estimated fraction of the Fat-tailed cluster was 0.59. However, the most consistent subclustering of the Fat-tailed cluster was obtained at K = 3 (SC = 0.865), with the Romanov sheep forming a distinct subcluster (Figure 1). The Andi and the Karakul type sheep breeds anchored the remaining two subclusters. Eight out of 14 breeds showed partial and varying memberships of the two subclusters, indicating their admixed origin (Figure 1).

The Composite cluster hosted the remaining 26 synthetic semi- and fine-wool sheep breeds that were split into three genetic subclusters (SC = 0.932). The three subclusters identified followed a pattern of geographical separation: long-wool Marsh and Texel type breeds from the north grouped into subcluster I (light blue); fine-wool breeds from the Caucasus, Kazakhstan and Buryatia formed subcluster II (light green) and the southern European Zackel type breeds grouped into subcluster III (pink, Figure 1). Other nine European, Caucasian and Asian sheep breeds had partial membership of multiple clusters, which represents more diverse ancestries in the process of breed development (Figure 1).

The PCoA results were quite in accordance with the STRUCTURE results. The breeds from the above mentioned Nordic cluster were separated from the other breeds on Axis I, which explained 48% of the distance matrix (Figure 2). On Axis II, breeds from the Fat-tailed cluster were separated from the Composite cluster breeds, which explained an additional 7%. A notable exception was the Romanov Sheep whose yellow circle in Figure 2 (at -0.085,-0.004) suggested the breed's clustering with the Nordic rather than the Fat-tailed breeds (Figure 2). This matched our prior expectations on the basis of phenotypic characters better than the STUCTURE result.

Figure 2

Principal coordinate plot of breeds based on Chord distance. Axis I explains 48% of the variation, axis II explains 7% of the variation. Breeds from Nordic cluster (based on STRUCTURE) are marked with red, breeds from Fat-tailed cluster are marker with yellow and breeds from the Composite cluster are marked with black circles.

The proportions of Nordic, Fat-tailed and Composite genetic ancestries within each of the regional groups studied are presented in Figure 3. The highest proportion of Fat-tailed ancestry was recorded at the southern periphery of the studied distribution, which gradually decreased northwards and was the smallest in the northern regional groups. The proportion of Nordic type ancestry mirrored this pattern and was the largest in the northern regional groups and decreased southwards. The 16 regional groups had similar proportions of Composite ancestry, with the exception of Stavropol and Caspian depression regional groups, where the proportion of Composite ancestry was highest, and the northernmost and southernmost regional groups, where the Composite ancestry proportion was least.

Figure 3

Distribution of three inferred genetic clusters in the study regions. Slices in the pie diagrams represent Fat-tailed (yellow), Composite (black) and Nordic (red) clusters. The Caucasus area is represented by four regions: South Caucasus (1), North Caucasus (2), Stavropol (3) and the Caspian depression (4). The Asian region is represented by three regions: Kazakhstan and east of the Caspian Sea region (5), the Altai region (6) and the Buryatia region (7). The remaining groups belong to eastern fringe of Europe: the Volga region (8), West Russia (9), Ukraine (10), Southeast Europe (11) Poland (12), Finland (13), Scandinavia (14), Denmark (15) and Iceland and the Faeroe Islands (16).

Geographical patterns in genetic diversity

To reduce the effect of possible recent breed-specific factors on the overall geographical distribution of genetic diversity, a synthetic map of genetic diversity was based on the total gene diversity (H T ) for triplets of neighboring breeds (Figure 4). The highest diversity was found in the southern region of the studied area: Buryatia (south Siberia), Caspian Sea and Black Sea basins. It decreased gradually in Central and northern Europe and the lowest H T values were recorded for southern Scandinavia (Figure 4). The trend can be observed also based on within breed estimates (Additional file 3: Figure S1). A significant but weak positive correlation (r = 0.382, P < 0.05) was calculated between the expected heterozygosity and the level of admixture based on global STRUCTURE results (K = 3) for the 52 sheep breeds studied, suggesting that admixture does not explain the presence of diversity hotspots, though it can contribute to it in some areas.

Figure 4

Contour synthetic map of total genetic diversity ( H T ) calculated for triplets of neighboring breeds. Darker shading indicates higher levels of diversity.

Analysis of molecular variance (AMOVA)

We tested the extent of population differentiation using AMOVA in the whole dataset, as well as grouping breeds according to geographical regions (the Caucasus, Asia, or the eastern fringe of Europe), and according to the 15 regional groups (excluding the Danish group represented by a single breed) (Table 2). As expected, most genetic variation (> 86%) was retained within the breeds, whereas only 0.41% to 0.95% (P < 0.001) of the variation could be explained by geographical partitioning (Table 2). The between-breed variation within each of the three genetic subclusters was significant (P < 0.001), ranging from 2.48% (Fat-tailed subcluster) to 13.71% (Nordic subcluster) (Table 2). In our data, using genetic clustering in AMOVA gives higher between groups variance than using geographical categorizations.

Table 2 Analysis of molecular variance

Core-set analysis

Of the 52 sheep breeds 24 had contributions to the core-set when the 4 weightings (λ = 0, 0.2, 0.5, 1) of within-breed diversity were considered. The 24 breeds represented all the 16 regional groups, except those of the Altai and Buryatia regions. The distribution of breeds was relatively even, with 1 to 2 sheep breeds per region, the exception being the Scandinavian regional group, which contributed 5 sheep breeds to this accumulated core-set (Additional file 4: Table S3). Looking at the four core-sets separately, the number of contributing breeds increased from 8 to 17 when weight of within-breed variability increased from 0 to 1 (Table 3). Results of analysis based on genetic clustering are comparable to those based on geographic regions (Additional file 5: Table S4).

Table 3 Distribution of core-set contributions

Every tested scenario with reduced weight for within-breed variation (λ < 1) gave a significantly higher number of breeds with non-zero contribution from the areas outside the hotspot regions (all the two-tailed P values < 0.02 using Fisher's exact test for independence; Table 3, Additional file 4: Table S3). Looking at the contribution to the core-set, these non-hotspot region populations comprised of > 90% of the set. However, optimizing for global diversity (λ = 1), there is no significant difference (the two tailed P = 0.77) in the proportion of breeds included between hotspot and non-hotspot regions. Very distinctively, now the core-set consists of 65% of the breeds from the hotspot regions because the included hotspot region breeds make a significantly larger mean contribution (each ~11%) than the included breeds from the non-hotspot regions (each ~3%) (Welch two sample t-test P = 0.009). Thus the diversity hotspot areas were important for conserving total genetic diversity in terms of the effort per conserved population rather than proportion or number of breeds to be conserved.

Conservation programs might be initiated with limited information. In cases where resources allow keeping only a small number of breeds and when there is no aim to differentiate between their contributions to the core-set (assuming equal contributions), the maximum amount of genetic diversity would be maintained by giving priority to breeds from the diversity hotspot regions. In the scenario of 5 breeds, four of them are from the hotspot areas (Additional file 6: Table S5). The proportion of hotspot breeds was reduced from 80% to 45% when assuming resources to keep 20 breeds. This latter set is similar to 17 breeds identified as contributors to the core-set when λ = 1 (Additional file 4: Table S3), but includes also three fat-tailed populations from the Caucasus (Bozakh, Tushin and Lezian). These results agree with the idea of having the initial conservation focus on hotspot regions.


We present here a comprehensive genetic analysis of sheep populations originating from a broad geographical area of the Eurasian subcontinent. Our results detected the presence of a sheep genetic diversity hotspot located close to the Near East, the assumed sheep domestication center, and highlight the importance of such an area in conservation planning. The results correspond well with the geographical pattern of genetic diversity distribution reported for cattle (B. taurus) [23] and goat (Capra hircus) [12] as well as a previous study of European sheep [13] which focused on more southern breeds. The congruence across studies suggests the pattern to be genuine, though larger number of markers could be desirable. Based on observed allele number, we can expect the reliability to be approximately similar as in a study of 300-400 unbiased bi-allelic SNPs [24]. However, since studies of humans do not suggest great discrepancies across nuclear marker types as long as ascertainment bias can be avoided [19, 20, 25], we expect the presented general diversity patterns to be robust. Since in our analysis conservation optimization was based on the same data used to define the diversity hotspot, our general recommendation for considering hotspot regions ought to be sound.

Livestock genetic diversity hotspots have been suggested to be very important for conservation because the domestic animal stocks associated with them might possess allelic variation from wild ancestors, which, due to a sequence of founding events, was lost during the dispersion of animals towards the northern parts of the continent [14]. However, to the best of our knowledge this question has not been directly addressed previously. Our results provide additional evidence for the importance of these regions, while indicating an important refinement for the conservation goal. Our results do not suggest that a larger proportion of populations from these areas needs to be conserved, but rather suggest more emphasis be placed on each conserved diversity hotspot population. This distinction, however, is highly relevant for domestic species, where management units are in most cases clearly definable as breeds. Further the results support for directing the first conservation resources to work on hotspot regions.

Of the three identified Northern Eurasian genetic clusters, the Nordic cluster was represented by native and old commercial sheep breeds adapted to live under cold and wet northern European climatic conditions. This group includes breeds such as Gute, Icelandic Sheep and Finnsheep, which descended from the sheep stock in the first dispersion event to Europe [4]. Strict breed boundaries over a long period and geographical isolation, particularly for insular breeds (the Icelandic Sheep and the Faeroe Island Sheep), are characteristic of the group and have resulted in a unique and genetically highly heterogeneous pool of Nordic sheep populations (Table 2).

The large Composite cluster with partial ancestry from improved western breeds contains genetically variable fine- and semi-fine-wool sheep breeds of admixed origin with moderate differentiation between the breeds. The presence of substructure within the cluster reflects the differences in the breeding trends within the former Soviet Union that took place in the middle of the last century. The sheep in the western part of Russia and Volga regions have Marsh-Texel type composite ancestry resulting from crossing local populations with British type long-wool sheep (Figure 1). The second subcluster within the Composite group includes the breeds prevalent in the Caucasus, the Stavropol region and the Caspian basin, another geographical center of purposeful crossbreeding, with a significant genetic component of the Merino type sheep. The third subcluster within the Composite cluster is anchored by two Zackel type mountain sheep populations, Pramenka and Kuchugur, and reflects a common ancestry for the majority of breeds within the subcluster. The grouping of Tsigai in the same subcluster confirms the assumption that this breed was strongly influenced by Zackel (e.g. see [17]). Most of the populations of the Composite cluster also represent genetic diversity of local origins as the upgrading was performed on the basis of local sheep populations, mixing them with a number of improved breeds of foreign ancestry to combine desired production and robustness characteristics.

The Fat-tailed cluster hosted very variable native coarse-wool populations, living under a variety of climatic conditions, ranging from semi-desert and steppe regions around the Caspian Sea and Central Asia to Caucasian mountain terrains. The differentiation of fat-tailed sheep from the others indicates restricted gene flow between steppe or mountain environments in central Eurasia and cooler and moister northern areas of the continent. The gene pool of the fat-tailed sheep divided into the mountain type sheep (e.g. Andi and Lezgian) and steppe-desert types (e.g. Gala and Karakul). However, the majority of fat-tailed breeds have their ancestries in both of these subclusters (Figure 1), which together with low differentiation estimates indicates substantial gene flow between them. This agrees with the traditional sheep breeding practices in the Caucasus, which promote gene flow through the long-distance nomadic pasturing of animals. Grouping of Romanov sheep within the Fat-tailed cluster (Figure 1) should be regarded cautiously.

Decisions on adaptation conservation should largely be based on reliable phenotypic evaluations. In humans, genetic and phenotypic diversity agree [26], but selection might affect phenotypes reducing correlation between phenotypic divergence and general genomic relatedness [27, 28]. This is particularly true for livestock which would imply need for testing (ecological) exchangeability (as in [29]). Unfortunately this is very difficult. A large proportion of the necessary phenotypic information exists only as informal knowledge of local breeders. Even the more rigidly collected data is rarely comparable between environments.

Molecular data can have a role in pointing out potential conservation gaps when phenotypic knowledge is limited. The usability of approaches based on molecular marker data in setting conservation priorities can be greatly improved by genome-wide surveys of molecular variation [30]. For example, scanning tens of thousands of SNP markers has the potential to identify selected loci [31] and allow comparison of the conservation values of several populations, both in the neutral and non-neutral context [30]. However, even with full genome sequences, valuation of populations can prove to be difficult due to incomplete understanding of the biology of the organisms and poorly definable conservation goals.

We used neutral molecular data for a specific set of populations and applied the method of Caballero and Toro [15] to calculate optimal contributions of Eurasian sheep breeds to the core set, which would minimize the mean kinship in the set and maximize Ne and genetic diversity of the species. While giving more emphasis to divergence has theoretical appeal, it did not increase ecological or phenotypic heterogeneity in the preferred set of breeds compared with the maximization of global diversity (and Ne). Maximization of global diversity prioritized a more diverse set of breeds originating from a range of biogeographic environments and having different genetic histories. Though the set looks reasonable, we acknowledge that it is based on incomplete data and we are hesitant to conclude that this particular design is optimal.


Neutral variation suggested a general rule of thumb to favour breeds from the diversity hotspot regions in the first phase of in situ and ex situ conservation actions. In the final design, however, approximately equal population presentation across environments is recommended, but still higher per population emphasis in areas of high diversity is suggested. A comprehensive valuation of breeds, particularly within each physical environment, should consider production systems, important biological characteristics and available genetic information, as well as consideration of the probability of success and the extinction risk of breeds.


Biological samples

In total, 1675 animals representing 52 sheep breeds were studied (Additional file 2: Table S2). Sheep were sampled from three geographical regions: The Caucasus, Asia, and the eastern fringe of Europe, including central and western Russia. Each geographical region was further subdivided into regional groups. The Caucasian area was composed of the southern Caucasus (the following breeds were sampled: Azerbaijan Mountain Merino, Bozakh, Gala, Karabakh, Mazekh, Tushin), northern Caucasus (Andi, Dagestan local, Dagestan Mountain Merino, Karachai, Lezgian), Stavropol (Caucasian, North Caucasian Mutton-Wool, Stavropol), and the Caspian depression (Aksaraisk type of Soviet Mutton-Wool, Grozny, Volgograd). The Asian area was subdivided into the Kazakhstan and east of the Caspian Sea group (Degeres Mutton-Wool, Kazakh Arkhar-Merino, Kazakh Edilbai, Kazakh Finewool, Russian Edilbai, Russian Karakul), Altay (Gorno-Altay local, Kulunda), and the Buryatia group (Baidarak, Transbaikal Finewool). The remaining nine groups covered the eastern fringe of Europe: the Volga region (Kuibyshev, Oparin), western Russia (Kuchugur, Romanov, Russian Romney Marsh), Ukraine (Carpathian Mountain, Sokolsk), southeast Europe (Moldavial Karakul, Moldavial Tsigai, Pramenka, Russian Tsigai), Poland (Olkuska, Swiniarka, Wrzosowka), Finland (Finnsheep, Finnish Grey Landrace), Scandinavia (Swedish Rya Sheep, Swedish Gottland Sheep, Swedish Gute Sheep, Norwegian Rygja Sheep, Norwegian Cheviot, Norwegian Feral Sheep), Denmark (Danish Texel), and Iceland and the Faeroe Islands (Icelandic Sheep, Faeroe Island Sheep). Unrelated animals were sampled based on pedigree records (two previous generations) or farmers' knowledge.

Genomic DNA was extracted from blood as described in [32], or from skin samples using DNeasy Tissue Kit (Qiagen, Crawley, West Sussex, UK). Prior to DNA extraction, skin samples stored in ethanol were washed twice with phosphate buffered saline to remove fixatives.

Genetic loci

The polymerase chain reactions (PCR) for 20 microsatellites (Additional file 1: Table S1) were performed as described in [33] and genotyped using the MegaBACE™ 500 DNA Sequencer (Amersham Biosciences). Fragment sizing was performed using the MegaBaceTM Genetic Profiler 2.2 or Fragment Profiler 1.2 (Amersham Biosciences). Genotypes for 20 microsatellites were available in the earlier studies for the Romanov sheep [34] and for the 11 breeds from Finland, Scandinavia, Denmark, Iceland and the Faeroe Islands [16].

Statistical analysis

The microsatellite loci were characterized by the total number of alleles, expected heterozygosity or total gene diversity [35], sample-size-corrected allelic richness [36] corresponding here expected allele number in a sample of nine diploid individuals, and F-statistics using FSTAT v2.93 [37]. F-statistics were estimated using Weir and Cockerham [22] method where f and θ correspond to Wright's coefficients F IS and F ST , respectively. The genetic relationships among breeds were analyzed using principal coordinate analysis (PCoA) as implemented in PAST v1.73 [38] using the Chord distance [39].

A model-based Bayesian clustering analysis was used to infer population structure and the level of admixture in the sheep breeds implemented in STRUCTURE v2.2 [40]. The STRUCTURE algorithm assumes K populations, each of which is in Hardy-Weinberg and linkage equilibrium and characterized by a set of allele frequencies at each locus. Analysis was performed with a burn-in length of 20,000 followed by 100,000 Markov chain Monte Carlo iterations for each of K = 1 to 10, with ten replicate runs for each K using independent allele frequencies and an admixture model. Results across ten runs at each K were compared based on similarity coefficients (SC) as previously described in [41]. The breeds were assigned to wide clusters based on major ancestry and submitted to a second round of STRUCTURE analysis performed within each wide cluster.

A linear regression analysis was performed to study the influence of breed ancestry diversity (admixture) on the level of genetic diversity. Ancestry diversity for each breed was calculated as 1-Σ(qk) 2, where qk is an average fraction of the breed's genetic ancestry from the k separate genetic clusters at the optimal K, identified in STRUCTURE analysis. To examine the significance of mixed ancestries as sources of within-breed diversity, the obtained ancestry diversity values were compared with the unbiased expected heterozygozity estimates.

For the geographical plotting of genetic diversity parameters, latitude and longitude values for each breed were obtained from the center of the sample distribution. The ArcView GIS v9.1 (Environmental Systems Research Institute, ESRI, Redlands, CA, USA) was used to map the allelic richness and expected heterozygosity for each breed and the surface was extrapolated to a full rectangle. This was based on the Inverse Distance Weighted interpolation method [42], which assumes each input point to have a local influence that diminishes with distance. A synthetic map for the distribution of local total gene diversity (HT) and θ calculated for the geographically neighboring triplets of populations was done similarly. Population triplets were formed using Delaunay triangulation method implemented in the program Triangle [43].

Components of within- and between-breed genetic diversity were calculated based on the molecular coancestry for populations following the method described by Caballero and Toro [15]. The molecular coancestry between two individuals is the probability that two alleles at the locus taken at random from each individual are alike in state. In a structured population with n breeds the molecular coancestry between breeds i and j (f ij ) is the average across loci and across individuals. Defining the within-breed average coancestry as f ~ = i f i i n , the total population coancestry as f ¯ = i , j f i j n 2 , Nei's minimum distance as D i j = f i i + f j j 2 f i j and the average Nei's minimum distance as D ¯ = i , j D i j n 2 , then the total gene diversity or expected heterozygosity ( G D T = 1 f ¯ ) is partitioned into components within breeds ( G D W S = 1 f ~ ) and another between breeds ( G D B S = f ~ f ¯ = D ¯ ) .

The importance of different breeds has been calculated based on the contribution of each breed to a pool of animals or a core set that would maximize its genetic diversity (e.g. [15, 44]). In the present study, the core set refers to the smallest set of sheep breeds that still encompasses the neutral genetic diversity in the species using the co-ancestry measure detailed above. These optimal contributions can also be applied with a weighted (λ) combination of within- and between-breed components of gene diversity λ ( 1 f ˜ ) + D ¯ . Maximizing global diversity is achieved by giving equal weights to within- and between-breed diversity (λ = 1), while maximizing between-breed variation is achieved by ignoring within-breed diversity (λ = 0). Two intermediate λ values were recommended in earlier studies. Piyasatian and Kinghorn [45] suggested giving five times weight to the between breed variation as to the within-breed variation (λ = 0.2), reflecting the speed by which genetic change can be made across populations compared with selection within one large mixed population. Bennewitz and Meuwissen [46] proposed a weighting based on maximizing the total genetic variance of a hypothetical quantitative trait, which is equivalent by using a weighting factor of λ = 0.5. These four λ values were applied in estimating the optimal contributions using a simulated annealing algorithm [47].


  1. 1.

    Peters J, Driesch AV, Helmer D: The upper Euphrates-Tigris basin: cradle of agro-pastoralism. The First Steps of Animal Domestication. 2004

    Google Scholar 

  2. 2.

    Ryder ML: Domestication, history and breed evolution in sheep. World Animal Science. B8. Genetic Resources of Pig Sheep and Goat. Edited by: Maijala K. 1991, Amsterdam: Elsevier, 157-177.

    Google Scholar 

  3. 3.

    Tapio M, Marzanov N, Ozerov M, et al: Sheep mitochondrial DNA variation in European Caucasian, and Central Asian areas. Molecular Biology and Evolution. 2006, 23: 1776-1783. 10.1093/molbev/msl043.

    Article  CAS  PubMed  Google Scholar 

  4. 4.

    Chessa B, Pereira F, Arnaud F, et al: Revealing the history of sheep domestication using retrovirus integrations. Science (New York N.Y.). 2009, 324: 532-536. 10.1126/science.1170587.

    Article  CAS  Google Scholar 

  5. 5.

    FAO: The State of World's Animal Genetic Resources for Food and Agriculture. 2007, Rome

    Google Scholar 

  6. 6.

    Kantanen J, Olsaker I, Adalsteinsson S, et al: Temporal changes in genetic variation of north European cattle breeds. Animal Genetics. 1999, 30: 16-27. 10.1046/j.1365-2052.1999.00379.x.

    Article  CAS  PubMed  Google Scholar 

  7. 7.

    Taberlet P, Valentini A, Rezaei HR, et al: Are cattle sheep, and goats endangered species?. Molecular Ecology. 2008, 17: 275-284. 10.1111/j.1365-294X.2007.03475.x.

    Article  CAS  PubMed  Google Scholar 

  8. 8.

    Vasquez CG, Bohren BB: Population size as a factor in response to selection for eight-week body weight in White Leghorns. Poultry Science. 1982, 1273-1278.

    Google Scholar 

  9. 9.

    Loftus RT, Ertugrul O, Harba AH, et al: A microsatellite survey of cattle from a centre of origin: the Near East. Molecular Ecology. 1999, 8: 2015-2022. 10.1046/j.1365-294x.1999.00805.x.

    Article  CAS  PubMed  Google Scholar 

  10. 10.

    Freeman AR, Bradley DG, Nagda S, Gibson JP, Hanotte O: Combination of multiple microsatellite data sets to investigate genetic diversity and admixture of domestic cattle. Animal Genetics. 2005, 37: 1-9. 10.1111/j.1365-2052.2005.01363.x.

    Article  Google Scholar 

  11. 11.

    Li M, Tapio I, Vilkki J, et al: The genetic structure of cattle populations (Bos taurus) in northern Eurasia and the neighbouring Near Eastern regions: implications for breeding strategies and conservation. Molecular Ecology. 2007, 16: 3839-3853. 10.1111/j.1365-294X.2007.03437.x.

    Article  PubMed  Google Scholar 

  12. 12.

    Cañón J, García D, García-Atance MA, et al: Geographical partitioning of goat diversity in Europe and the Middle East. Animal Genetics. 2006, 37: 327-334. 10.1111/j.1365-2052.2006.01461.x.

    Article  PubMed  Google Scholar 

  13. 13.

    Peter C, Bruford M, Perez T, et al: Genetic diversity and subdivision of 57 European and Middle-Eastern sheep breeds. Animal Genetics. 2007, 38: 37-44. 10.1111/j.1365-2052.2007.01561.x.

    Article  CAS  PubMed  Google Scholar 

  14. 14.

    Bruford MW, Bradley DG, Luikart G: DNA markers reveal the complexity of livestock domestication. Nature Reviews. Genetics. 2003, 4: 900-910. 10.1038/nrg1203.

    Article  CAS  PubMed  Google Scholar 

  15. 15.

    Caballero A, Toro MA: Analysis of genetic diversity for the management of conserved subdivided populations. Conservation Genetics. 2002, 3: 289-299. 10.1023/A:1019956205473.

    Article  CAS  Google Scholar 

  16. 16.

    Tapio M, Tapio I, Grislis Z, et al: Native breeds demonstrate high contributions to the molecular variation in northern European sheep. Molecular Ecology. 2005, 14: 3951-3963. 10.1111/j.1365-294X.2005.02727.x.

    Article  CAS  PubMed  Google Scholar 

  17. 17.

    Lawson Handley L, Byrne K, Santucci F, et al: Genetic structure of European sheep breeds. Heredity. 2007, 99: 620-631. 10.1038/sj.hdy.6801039.

    Article  CAS  PubMed  Google Scholar 

  18. 18.

    Kijas JW, Townley D, Dalrymple BP, et al: A genome wide survey of SNP variation reveals the genetic structure of sheep breeds. PloS one. 2009, 4: e4668-10.1371/journal.pone.0004668.

    PubMed Central  Article  PubMed  Google Scholar 

  19. 19.

    Romero IG, Manica A, Goudet J, Handley LL, Balloux F: How accurate is the current picture of human genetic variation?. Heredity. 2009, 102: 120-126. 10.1038/hdy.2008.89.

    Article  CAS  PubMed  Google Scholar 

  20. 20.

    Conrad DF, Jakobsson M, Coop G, et al: A worldwide survey of haplotype variation and linkage disequilibrium in the human genome. Nature Genetics. 2006, 38: 1251-1260. 10.1038/ng1911.

    Article  CAS  PubMed  Google Scholar 

  21. 21.

    Nielsen R, Hubisz MJ, Clark AG: Reconstituting the frequency spectrum of ascertained single-nucleotide polymorphism data. Genetics. 2004, 168: 2373-2382. 10.1534/genetics.104.031039.

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  22. 22.

    Weir BS, Cockerham CC: Estimating F-statistics for the analysis of population structure. Evolution. 1984, 38: 1358-1370. 10.2307/2408641.

    Article  Google Scholar 

  23. 23.

    Bradley DG, Magee DA: Genetics and the Origins of Domestic Cattle. Documenting domestication: new genetic and archaeological paradigms. Edited by: Zeder MA, Bradley DG, Emshwiller E, Smith BD. 2006, London: University of California Press Ltd, 317-328.

    Google Scholar 

  24. 24.

    Kalinowski ST: How many alleles per locus should be used to estimate genetic distances?. Heredity. 2002, 88: 62-65. 10.1038/sj.hdy.6800009.

    Article  CAS  PubMed  Google Scholar 

  25. 25.

    Jorde LB, Watkins WS, Bamshad MJ, et al: The distribution of human genetic diversity: a comparison of mitochondrial autosomal, and Y-chromosome data. American Journal of Human Genetics. 2000, 66: 979-988. 10.1086/302825.

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  26. 26.

    Manica A, Amos W, Balloux F, Hanihara T: The effect of ancient population bottlenecks on human phenotypic variation. Nature. 2007, 448: 346-348. 10.1038/nature05951.

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  27. 27.

    McKay J, Latta R: Adaptive population divergence: markers QTL and traits. Trends in Ecology & Evolution. 2002, 17: 285-291.

    Article  Google Scholar 

  28. 28.

    Leinonen T, O'Hara RB, Cano JM, Merilä J: Comparative studies of quantitative trait and neutral marker divergence: a meta-analysis. Journal of Evolutionary Biology. 2008, 21: 1-17.

    CAS  PubMed  Google Scholar 

  29. 29.

    Rader RB, Belk MC, Shiozawa DK, Crandall KA: Empirical tests for ecological exchangeability. Animal Conservation. 2005, 8: 239-247. 10.1017/S1367943005002271.

    Article  Google Scholar 

  30. 30.

    Bonin A, Nicole F, Pompanon F, Miaud C, Taberlet P: Population adaptive index: a new method to help measure intraspecific genetic diversity and prioritize populations for conservation. Conservation Biology. 2007, 21: 697-708. 10.1111/j.1523-1739.2007.00685.x.

    Article  PubMed  Google Scholar 

  31. 31.

    Beaumont MA, Nichols RA: Evaluating Loci for Use in the Genetic Analysis of Population Structure. Proceedings of the Royal Society B: Biological Sciences. 1996, 263: 1619-1626. 10.1098/rspb.1996.0237.

    Article  Google Scholar 

  32. 32.

    Tapio M, Miceikiené I, Vilkki J, Kantanen J: Comparison of microsatellite and blood protein diversity in sheep: inconsistencies in fragmented breeds. Molecular Ecology. 2003, 12: 2045-2056. 10.1046/j.1365-294X.2003.01893.x.

    Article  CAS  PubMed  Google Scholar 

  33. 33.

    Tapio I, Tapio M, Grislis Z, et al: Unfolding of population structure in Baltic sheep breeds using microsatellite analysis. Heredity. 2005, 94: 448-456. 10.1038/sj.hdy.6800640.

    Article  CAS  PubMed  Google Scholar 

  34. 34.

    Tapio M, Ozerov M, Viinalass H, Kiseliova T, Kantanen J: Molecular genetic variation in sheep of the central Volga area inhabited by Finno-Ugric peoples. Agricultural and Food Science. 2007, 16: 157-169. 10.2137/145960607782219346.

    Article  Google Scholar 

  35. 35.

    Nei M: Molecular Evolutionary Genetics. 1987, New York: Columbia University Press

    Google Scholar 

  36. 36.

    El Mousadik A, Petit RJ: High level of genetic differentiation for allelic richness among populations of the arvan tree [Arvania spinosa (L.) Skeels] endemic to Morocco. Theoretical and Applied Genetics. 1996, 92: 832-839. 10.1007/BF00221895.

    Article  CAS  PubMed  Google Scholar 

  37. 37.

    Goudet J: FSTAT (Version 1.2): A Computer Program to Calculate F-Statistics. Journal of Heredity. 1995, 86-

    Google Scholar 

  38. 38.

    Hammer Ø, Harper D, Ryan P: PAST: paleontological statistics software package for education and data analysis. Palaeontologia Electronica. 2001, 4: 9-

    Google Scholar 

  39. 39.

    Cavalli-Sforza LL, Edwards AW: Phylogenetic analysis. Models and estimation procedures. American Journal of Human Genetics. 1967, 19: 233-257.

    PubMed Central  CAS  PubMed  Google Scholar 

  40. 40.

    Pritchard JK, Stephens M, Donnelly P: Inference of population structure using multilocus genotype data. Genetics. 2000, 155: 945-959.

    PubMed Central  CAS  PubMed  Google Scholar 

  41. 41.

    Rosenberg NA, Pritchard JK, Weber JL, et al: Genetic structure of human populations. Science (New York N.Y.). 2002, 298: 2381-2385. 10.1126/science.1078311.

    Article  CAS  Google Scholar 

  42. 42.

    Shepard D: A two-dimensional interpolation function for irregularly-spaced data. Proceedings of the 1968 23rd ACM national conference on. 1968, 517-524. full_text.

    Google Scholar 

  43. 43.

    Shewchuk J: Delaunay refinement algorithms for triangular mesh generation. Computational Geometry. 2002, 22: 21-74. 10.1016/S0925-7721(01)00047-5.

    Article  Google Scholar 

  44. 44.

    Eding H, Crooijmans RP, Groenen MA, Meuwissen TH: Assessing the contribution of breeds to genetic diversity in conservation schemes. Genetics Selection Evolution. 2002, 34: 613-633. 10.1186/1297-9686-34-5-613.

    Article  Google Scholar 

  45. 45.

    Piyasatian N, Kinghorn BP: Balancing genetic diversity genetic merit and population viability in conservation programmes. Journal of Animal Breeding and Genetics. 2003, 120: 137-149. 10.1046/j.1439-0388.2003.00383.x.

    Article  Google Scholar 

  46. 46.

    Bennewitz J, Meuwissen TH: A novel method for the estimation of the relative importance of breeds in order to conserve the total genetic variance. Genetics Selection Evolution. 2005, 37: 315-337. 10.1186/1297-9686-37-4-315.

    Article  Google Scholar 

  47. 47.

    Kirkpatrick S: Optimization by simulated annealing: Quantitative studies. Journal of Statistical Physics. 1984, 34: 975-986. 10.1007/BF01009452.

    Article  Google Scholar 

Download references


This work was financially supported by the Academy of Finland and the Finnish Ministry of Agriculture and Forestry (the SUNARE and Russian in Flux programs). We thank A. Virta, M. Saura and J. Fernández for technical assistance. Our thanks are also to I.A. Kalashnikov, V. Togmitova, and N. Nikolaeva for their help in collecting Buryatian samples. Comments given by Dr Meng-Hua Li are acknowledged. The International Livestock Research Institute (ILRI) in Nairobi, Kenya is acknowledged for providing office space for IT when working on the project.

Author information



Corresponding author

Correspondence to Juha Kantanen.

Additional information

Authors' contributions

MT supervised the molecular analysis, consistency of allele calling, coordinated or performed statistical analysis and wrote the final drafts of the paper. MO did the genotyping and most of the writing and statistical analyses for the first draft. IT had significant contribution both to statistical analyses and manuscript writing. MAT contributed to analysis design and molecular co-ancestry based analyses. NM, MC, GG, TK and MM have collaborated in study design, sampling and interpretation of the results. In addition, MC and TK did part of the molecular analyses. JK was in charge of the overall study including it's design, sample collection, statistical analysis, manuscript writing and coordinating the author contributions. All authors read and approved the final manuscript.

Electronic supplementary material

Table S1 - Marker diversity parameters

Additional file 1:. PDF file with list of microsatellites and their chromosomal location, total number of alleles, expected unbiased heterozygosity, and estimates of within-population (f) and among-population (θ) fixation indices. (PDF 57 KB)

Table S2 - Table of the name of sheep breeds, their origin, demographic status and diversity parameters

Additional file 2:. PDF file with data on per population sample size, expected heterozygosity, within-breed fixation index (f), allelic richness, and number of private alleles. (PDF 82 KB)

Additional file 3:Figure S1 - Additional synthetic maps. PDF file synthetic maps for within-breed diversity and breed differentiation. (PDF 15 MB)

Table S3 - Breed-wise optimal contributions to a core-set for different weightings of the within-breed variation

Additional file 4:. PDF file with detailed data summarized in Table 3. (PDF 56 KB)

Table S4 - Distribution of core-set contributions using genetic clustering

Additional file 5:. PDF file with table similar to Table 3, but using genetic clusters instead of regional groups to categorize breeds. (PDF 47 KB)

Additional file 6:Table S5 - Breeds, having equal contributions to the core set when the number of breeds conserved is fixed. PDF file with table of included breeds when the number of included breeds is fixed at 5, 10, 15 or 20. (PDF 55 KB)

Authors’ original submitted files for images

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Tapio, M., Ozerov, M., Tapio, I. et al. Microsatellite-based genetic diversity and population structure of domestic sheep in northern Eurasia. BMC Genet 11, 76 (2010).

Download citation


  • Single Nucleotide Polymorphism
  • Regional Group
  • Sheep Breed
  • Caspian Basin
  • Total Gene Diversity