Microsatellite based genetic diversity and population structure of the endangered Spanish Guadarrama goat breed

Background Assessing genetic biodiversity and population structure of minor breeds through the information provided by neutral molecular markers, allows determination of their extinction risk and to design strategies for their management and conservation. Analysis of microsatellite loci is known to be highly informative in the reconstruction of the historical processes underlying the evolution and differentiation of animal populations. Guadarrama goat is a threatened Spanish breed which actual census (2008) consists of 3057 females and 203 males distributed in 22 populations more or less isolated. The aim of this work is to study the genetic status of this breed through the analysis of molecular data from 10 microsatellites typed in historic and actual live animals. Results The mean expected heterozygosity across loci within populations ranged from 0.62 to 0.77. Genetic differentiation measures were moderate, with a mean FST of 0.074, GST of 0.081 and RST of 0.085. Percentages of variation among and within populations were 7.5 and 92.5, respectively. Bayesian clustering analyses pointed out a population subdivision in 16 clusters, however, no correlation between geographical distances and genetic differences was found. Management factors such as the limited exchange of animals between farmers (estimated gene flow Nm = 3.08) mostly due to sanitary and social constraints could be the major causes affecting Guadarrama goat population subdivision. Conclusion Genetic diversity measures revealed a good status of biodiversity in the Guadarrama goat breed. Since diseases are the first cause affecting the census in this breed, population subdivision would be an advantage for its conservation. However, to maintain private alleles present at low frequencies in such small populations minimizing the inbreeding rate, it would necessitate some mating designs of animals carrying such alleles among populations. The systematic use of molecular markers will facilitate the comprehensive management of these populations, which in combination with the actual breeding program to increase milk yield, will constitute a good strategy to preserve the breed.


Background
Before the intensification and industrialisation process of the last decades, European livestock farming was generally extensive and closely linked to the use of farmland. This is still the case for small ruminants and especially for goats, where not only local breeds do not benefit from modern breeding techniques but they also are about to disappear. Thus, the decline of local breeds and their production systems are raising concern about the importance of European agro-ecosystems and cultural landscapes maintenance. The goat is among the earliest species to be domesticated [1]. Goats are distributed over all types of eco-niches, including tropical areas, dry zones and mountain regions. With such a wide distribution and adaptability [2], the goat is expected to have high genetic diversity as a result of both, natural selection for fitness under varied environmental conditions and the artificial selection for milk, meat, fibre and other purposes.
However, in contrast to high productive foreign goat breeds, most local breeds are not subject to breeding programs to improve production traits, which would increase their genetic ability for productivity and consequently their profitability. Due to the extensive conditions of animal management, existing breeding strategies applied to local breeds are constrained by poor pedigree recording. In this way, the lack of pedigree records can lead to both a limited genetic progress for the selected trait and a suboptimal inbreeding control. The use of highly variable molecular genetic markers, such as microsatellites, is one of the most powerful means for studying genetic diversity and pedigree reconstruction because of their high degree of polymorphism, random distribution across the genome and neutrality with respect to selection [3][4][5].
Guadarrama goat, a rustic breed which has been exploited in mountain areas in the centre of Spain since XVIII century, constitutes a good illustrative example. Guadarrama goat is a threatened breed whose actual population consists of 3057 females and 203 males distributed in 22 herds, which correspond to an effective population size of 763 individuals considering only the unequal number of males and females [6]. Other effects not considered here, as unequal parental contributions, overlapping generations, genetic drift, selection etc, can also influence the magnitude of the effective size. Common diseases such as tuberculosis and paratuberculosis are the main causes of the high culling rates (near 25% per year) existing in this breed. Generally, herd's size ranged from 300 to 500 animals.
This breed is mainly used for milk production, although it is also exploited in a local meat industry. Since 1998 Guadarrama breeders association has established a breeding program to increase milk yield. There are a high degree of disconnectedness among the herds of this breed and scarce pedigree information due to the low spread of the artificial insemination. Therefore, genetic evaluations of animals for milk yield are only comparable at the intraherd level. For this reason, a pilot project for pedigree reconstruction based on molecular markers information (10 microsatellite) has been established in 2003 as an alternative to pedigree recording.
The aim of this work is to evaluate the genetic status of the Guadarrama goat breed making use of the molecular information generated in the breeding program analyzing both, their genetic diversity and its population structure or subdivision using clustering methods.

Biological samples
Fresh blood samples were obtained between years 2004-2007 from 6635 goats pertaining to 20 different herds, which constitutes the whole animals of the Guadarrama breed. Figure 1 illustrates geographical locations of these 20 populations. Total genomic DNA was isolated from blood using the Real Biotech Corporation ADN extraction kit (Durbiz).

Statistical analysis
The expected heterozygosity corrected for sampling bias [15], the observed heterozygosity, the polymorphic information content and the estimated null allele frequency were calculated for each locus in the whole population using CERVUS version 3.0.3. [16]. GENEPOP 3.4 package [17] was used to perform the exact test for Hardy-Weinberg equilibrium by microsatellite loci (test multi-population) and by population (test multi-locus) using the Markov chain method with 1000 iterations, and considering the heterozygote deficit as the alternative hypothesis. Wright's F-statistics F IS , F ST , and F IT [18] jackknifing over populations and loci were calculated by FSTAT version 2.9.3.2 [19]. Gene flow (Nm) was estimated by the approximation of Wright [20] F ST ≈ 1/(1+4Nm) assuming genetic markers neutrality and an island model. Heterozygosities, mean number of alleles across populations, F IS within populations and Gst were calculated with GENETIX 4.03 software [21]. Furthermore, F ST values for pairwise comparisons of the 20 Guadarrama goat herds and their significance level for genetic differentiation and Rst were tested with FSTAT. Significance levels were set using the sequential Bonferroni correction (initial k = 190). GENCLASS 2.0 package [22] under a Bayesian approach [23] was used in the assessment of animals to the predefined populations in which their respective genotypes were most likely to occur. The Mantel test [24] was performed with GENETIX 4.03 to test the correlation between the F ST values and the geographic distances between populations. The population structure was analyzed by cluster techniques with the software STRUCTURE 2.1 [25] and BAPS 4.14 [26,27]. Due to the high number of missing data for the BM1818 marker, only nine of the ten loci genotyped were used in these analyses. According to Falush et al. [28], STRUCTURE analysis was performed considering both the admixture model and the correlated allele frequencies between populations. The length of the burn-in and MCMC (Monte Carlo Markov chain) were 10,000 and 100,000, respectively. For the whole data set (6635 animals distributed in 20 original populations) 15 runs were carried out for each value of K, being K the number of clusters. The range of possible Ks tested was from 2 to 23 (the real number of herds plus 3). For each value of K the mean of the log probability of data (L(K)) over 15 runs were calculated. F ST mean values for each cluster were also estimated. BAPS was run setting the maximum number of cluster at 20. Results were based on 50 simulations from the posterior allele frequencies. Finally, locus by locus AMOVA analysis considering groups and populations as sources of variation was assessed by ARLE-QUIN 3.1 software package [29].

Microsatellite markers
A total of 170 alleles were detected at the 10 microsatellite loci assessed in the 6335 goats genotyped. Table 1 shows the genetic variability measures corresponding to these 10 loci. Differences in the number of animals genotyped per microsatellite were due to amplification failures. There were many problems with the amplification of the marker BM1818, which finally was genotyped only in 1371 animals. Except ILSTS005 (0.44), all markers were highly informative (PIC>0.50) which make them useful in genetic diversity studies. The number of alleles per locus ranged from 9 (ILSTS005) to 36 (CSSM66) being 17 the mean number of alleles per locus. Private alleles (UAN in table 1) occurred at very low frequencies (<0.025) for all loci in most populations. The mean observed and expected heterozygosities across loci were 0.70 (SD 0.09) and 0.77 (SD 0.10), respectively. Only the CSSM066 marker was characterized by a fairly high frequency of null alleles (11%). Table 2 shows Wright' F-statistics and gene flow (Nm) for each locus across the 20 herds of Guadarrama goat breed. Mean values of F IS and F ST across loci were 0.023 and 0.074, respectively.
Results of the Fisher's exact test for Hardy-Weinberg (HW) equilibrium across loci and populations, considering the heterozygote deficit as the alternative hypothesis, are shown in Table 3. Highly significant (p < 0.001) multilocus departures from HW proportions were found for most populations and significant (p < 0.05) for populations 2 and 20. Populations 10, 12 and 16, had non significant pvalues for the statistical test. Single locus test across populations to asses departure from HW showed no significant p-values for ILSTS005, ILSTS011, BM6526, BM8125 and MCM53 markers.

Genetic diversity within populations
The mean number of alleles across loci, the mean observed and expected heterozygosities and the F IS estimates within the 20 Guadarrama goats populations, are shown in Table 4. Mean number of alleles across loci was higher than 9 in half of the populations. Heterozygosity Results from GENECLASS assignment test revealed that about 77% of the animals were assigned to the population they were collected from. The higher percentages (91.2% to 97.2%) of individuals assigned to its original population occurred in populations 7, 10, 15, 18 and 20 while the lower correspond to populations 4 (68.2%) and 19  (Nei, 1987). Wright's statistics according to Weir and Cockerham, 1984 Figure 2 shows the log probability of data (L(K)) for the admixture and correlated frequencies model under exhaustive sampling (averaged over the 15 replicates) of the STRUCTURE package. The highest L(K) averaged over replicates running for each value of K (K from 2 to 23), was observed for K = 16 (-175,201.08). For K varying from 2 to 17 and from 20 to 23 the runs reach equilibrium and converged to similar L(K) values. However for K equal to 18 and 19, the system showed more erratic values across replicates. Therefore for these values of K, 10 new replicates were made setting the length of the burning period in 50,000 and of the MCMC in 500,000. Similar but more stable values of L(K) were found in this case.

Population structure
Estimated α values averaged 0.04 for K varying from 2 to 19, indicating that most individuals were essentially from one population or another. However, for K values ranging from 20 to 23, α values varied from 1.50 to 2.50 indicating that most individuals were admixed.
Using BAPS package the highest likelihood was also obtained with K = 16 (-174,885.14). Log probability of data (L(K)) for K values ranging from 2 to 23 for the admixture and correlated frequencies model, under exhaustive sampling (averaged over 15 replicates) for the Guadarrama goat breed (K = number of clusters) Figure 2 Log probability of data (L(K)) for K values ranging from 2 to 23 for the admixture and correlated frequencies model, under exhaustive sampling (averaged over 15 replicates) for the Guadarrama goat breed (K = number of clusters). Length of burn-in 10,000. MCMC 100,000 Vertical bars reflect standard deviations.  Table 5 shows the percentage of membership for each predefined population and the mean value of F ST in each of the 16 inferred clusters, for the high estimate of L(K) (-174,478.30) among the 15 replicates ran for K = 16. Clusters 2, 7, 11 and 12, had moderate to high proportions of members from two of the original populations. Clusters 11 and 12 were essentially a mixture of animals from populations 5 and 14 and populations 7 and 19, respectively. Cluster 5 seems to be the most heterogeneous group, containing moderate proportions of animals from populations 1, 9, 2 and 6. Table 6 shows locus by locus AMOVA analysis which was performed considering groups (16 clusters) and populations (20) as sources of variation. Percentages of variation of the number of alleles (F ST ) and of the allele size (Rst) among groups, among populations within groups and within populations were estimated. In both cases, the highest percentage of variation (92-93%) corresponded to the within population component. Components among groups and among populations within groups showed low and similar magnitudes (3-4%).

Discussion
In order to maintain genetic diversity, breeding strategies that increase effective population size minimizing genetic drift effect should be implemented. Microsatellite markers in combination with recent statistical methodologies represent a useful tool for the conservation and management of endangered breeds.
A breeding program focused on improving Guadarrama goats milk yield has been carried out in Spain since 1998. In the present work, the actual situation concerning genetic diversity and population structure of this breed has been evaluated using the molecular information derived from 10 microsatellites loci and the use of clustering methods.
The total number of alleles per locus in the present study ranged from 9 to 36. This fact suggested that all markers used were appropriated to analyze genetic diversity in this breed. A more appropriate measure of genetic variation within a population was gene diversity (average expected heterozygosity). Gene diversity estimated in this breed was 0.70, which was in the range (0.3 to 0.8) to be useful for measuring genetic variation [30]. This value was similar to those previously reported (0.69) in other goat breeds [31] and in 31 animals from Guadarrama breed using 30 microsatellites [32]. The mean number of alleles found here (17) was higher than those, 7.7, estimated by Cañón et al. [32]. This could be due to the higher sample size used in our study. In assessing diversity estimates from different studies, it should be mentioned that the values are not directly comparable, as different microsatellite have been used. There were two common microsatellites with Cañon et al. [32]. Hence the comparison has only suggestive indication.
Although only a seven percent of the total genetic variability could be attributed to differences among subpopulations, evidences of a moderate genetic subdivision (mean F ST = 0.074) in the Guadarrama goat population were detected. Similar F ST value was found in a large analysis [32] using samples of 45 goat breeds from Europe and Middle Eastern countries. Thus, genetic variability within breeds seems to be as important as genetic variability among them. In the Guadarrama goat breed significant genetic differentiation (p < 0.05) was found in 92 out of the 190 population pairs after sequential Bonferroni correction.
The high genetic diversity observed in a breed could be explained by overlapping generations, mixing of populations from different geographical locations, natural selection favouring heterozygosity or subdivision accompanied by genetic drift [33]. Isolation, founder effects, genetic drift and different selection pressures realized by farmers in each population may have played major role in differentiation of Guadarrama goats. STRUCTURE and BAPS clustering software have the ability of inferring the correct number of subpopulations and assigning individuals appropriately even when genetic differentiation among groups is low (0.02 to 0.05) [34] and using a relative small number of loci (7 microsatellites) [25]. In this case, results derived from both programs provide a strong support of a 16 cluster subdivision. This subdivision seems to be reasonable, since few farmers exchange animals and therefore these populations show more genetic homogeneity. The high average percentage of assignment (77%) of individuals to the population they were collected from, pointed out the existence of clear genetic differences between populations. In addition, AMOVA indicated that 7.5% of the total genetic variation is between populations of this breed while the remaining 92.5% corresponded to differences among individuals.
Genetic differences were not correlated with geographic distances among populations (Mantel test) therefore management factors such as the limited exchange of animals between farmers mostly due to sanitary, social and cultural reasons could constitute the major causes affecting Guadarrama goat population subdivision. In this breed tuberculosis and paratuberculosis are the main causes of mortality and culling. These kinds of diseases have high prevalence in the affected herds. Thus, subdivision would be an advantage preserving the breed from the dissemination of such diseases. Reproductive isolation, consequence of the local use and management of the breed, reduces the effective population size and contributes to the genetic subdivision. Considering Wright' F-statistics results, subdivision processes more than inbreeding (average F IS across loci was 0.022 ± 0.017), could be the cause of the observed genetic differences between populations. Furthermore, populations analyzed were not in HW equilibrium, as it is revealed by the smaller observed than expected heterozygosity. The heterozygote deficiency is probably reflecting a subdivided population structure (Wahlund effect) rather than selection against heterozygotes.

Conclusion
In this work we have demonstrated that Guadarrama goat genetic diversity is still conserved. Management factors such as the limited exchange of animals between farmers (estimated gene flow Nm = 3.08) could be the major causes affecting Guadarrama goat population subdivision. Since diseases are the first cause affecting Guadarrama goat census, population subdivision would be an advantage for the conservation of the breed. In such cases, additional constraints, such as the minimum levels of contribution of each population should be included in the conservation strategy [35]. To maintain private alleles present at low frequencies in such small populations avoiding an increase of the inbreeding rate, it would be necessary to develop some strategies to spread such alleles across populations. Since molecular markers allow inferring genealogical relationships, it would be possible to take measures on the mating scheme to minimize coancestry or kinship in the subdivided population. The systematic use of molecular markers can facilitate the comprehensive management of endangered populations and should be combined with breeding schemes to improve economic traits avoiding the deterioration of the breeds.