Characterization of linkage disequilibrium, consistency of gametic phase and admixture in Australian and Canadian goats

Background Basic understanding of linkage disequilibrium (LD) and population structure, as well as the consistency of gametic phase across breeds is crucial for genome-wide association studies and successful implementation of genomic selection. However, it is still limited in goats. Therefore, the objectives of this research were: (i) to estimate genome-wide levels of LD in goat breeds using data generated with the Illumina Goat SNP50 BeadChip; (ii) to study the consistency of gametic phase across breeds in order to evaluate the possible use of a multi-breed training population for genomic selection and (iii) develop insights concerning the population history of goat breeds. Results Average r2 between adjacent SNP pairs ranged from 0.28 to 0.11 for Boer and Rangeland populations. At the average distance between adjacent SNPs in the current 50 k SNP panel (~0.06 Mb), the breeds LaMancha, Nubian, Toggenburg and Boer exceeded or approached the level of linkage disequilibrium that is useful (r2 > 0.2) for genomic predictions. In all breeds LD decayed rapidly with increasing inter-marker distance. The estimated correlations for all the breed pairs, except Canadian and Australian Boer populations, were lower than 0.70 for all marker distances greater than 0.02 Mb. These results are not high enough to encourage the pooling of breeds in a single training population for genomic selection. The admixture analysis shows that some breeds have distinct genotypes based on SNP50 genotypes, such as the Boer, Cashmere and Nubian populations. The other groups share higher genome proportions with each other, indicating higher admixture and a more diverse genetic composition. Conclusions This work presents results of a diverse collection of breeds, which are of great interest for the implementation of genomic selection in goats. The LD results indicate that, with a large enough training population, genomic selection could potentially be implemented within breed with the current 50 k panel, but some breeds might benefit from a denser panel. For multi-breed genomic evaluation, a denser SNP panel also seems to be required. Electronic supplementary material The online version of this article (doi:10.1186/s12863-015-0220-1) contains supplementary material, which is available to authorized users.


Background
Goats are highly adaptable to different environmental conditions being raised all over the world for milk, meat and fibre production. Although they present reasonable reproductive and productive performance, it is necessary to improve their production efficiency to become more competitive with other livestock industries. In this regard, genetic selection plays a very important role and substantial genetic gain has been achieved using traditional breeding methods. However, there are some important traits that are difficult or expensive to measure (e.g. resistance to diseases, carcass traits, etc.), measured late in life or sex limited (e.g. milk production and composition). The development of genomic technologies means that new methods have become available such as genomic selection (GS) proposed by Meuwissen et al. [1].
GS has been successfully implemented in dairy cattle breeding programs and it is either under development or in the process of being implemented in other animal species. In dairy cattle the main advantage of GS is that it reduces the generation interval increasing the genetic gain per year. In goats, the generation interval is relatively lower than cattle, but could still be reduced. GS could also help to increase the selection intensity, which would increase productivity and reduce costs in breeding programs. As a first step for goat breeders, a 50 K SNP panel [2] has been developed by the International Goat Genome Consortium (IGGC), facilitating both genome wide association studies (GWAS) and the opportunity to implement GS.
One relevant parameter to the implementation of genomic selection in a breeding program is the extent to which linkage disequilibrium (LD) persists across the genome and how it varies between populations. LD is defined as a non-random association of alleles at two or more loci and is influenced by population history breeding system and the pattern of geographic subdivision [3]. The marker density required for successful GWAS and subsequently genomic selection, depends on the extent of LD across the genome [4]. A low LD level would require a higher marker density to enable markers to capture most of the genetic variation in the population. The persistence of LD has been evaluated in a number of domesticated animal species including pigs [5][6][7], horses [8], cattle [9][10][11] and sheep [12,13]. A preliminary evaluation has also been conducted in goats using French dairy breeds [14]. Given the persistence of LD varies considerably between breeds in other species [13], it is important to characterise LD in a diverse collection of goat populations.
In addition to linkage disequilibrium accuracy of genomic selection also depends on the number of records available to estimate marker effects (training population). This may be a limitation factor for implementation of GS in goats because the genotyping costs are still relatively high compared to the economic value of the animals. An alternative to increase the number of animals in the training population is combining data from multi-breed populations. To obtain good accuracies of predictions using multi-breed populations it is required not only high LD between the markers and the quantitative trait loci (QTL) in each breed, but also high consistency of gametic phase between the markers and the QTL across breeds. Consistency of gametic phase is a measure of the degree of agreement of gametic phase for pairs of markers between two populations [6] that is also dependent of the difference on allele frequencies and relatedness between the two populations.
A variety of evolutionary phenomena impact observed allele frequencies distributions and the persistence of linkage disequilibrium. These include forces such as genetic drift migration, natural selection, and mutation rate. Therefore, population history strongly influences the extent of LD, particularly in domestic animal populations which have undergone bottlenecks during both domestication and the subsequent formation of breeds. The strength of these forces is likely to be different across the farm yard animal species, and indeed between breeds within each species. This prompted the investigation, in this study, of aspects of population history including ancestral effective population size, which can be inferred from the observed extent of LD [15][16][17].
There are many goat breeds been raised commercially all over the world and during the years they were characterized by high levels of admixture followed by animal movement. For instance goats were carried by the early explorers to America and Oceania [18] and some African breeds were also introduced more recently, such as South African Boer [19]. In order to better understand how modern goat breeds developed historically and to what degree they may have been mixed in the past, one alternative is to look at their breed composition through an analysis of admixture and/or principal components analysis (PCA).
Basic understanding of LD and population structure as well as the consistency of gametic phase across breeds is crucial for the implementation of genomic selection and is still limited in goats. Therefore, the objectives of this research were to estimate genome-wide levels of LD in Australian and Canadian goat breeds using data generated with the Illumina Goat SNP50 BeadChip to study the consistency of gametic phase between different breeds in order to evaluate the possible use of a multi-breed training population for genomic selection and develop insights concerning the population history of goat breeds.

Methods
The Canadian animals included in this study were managed in accordance with the Recommended Code of Practice for the Care and Handling of Farm Animals -GOATS (Canadian Agri -Food Research Council) [20]. All the samples were collected from commercial farms and the animal owners agreed to be involved in the project through their respective associations i.e. Ontario Goat and Société des éleveurs de chèvres laitières de race du Quebec. Samples were collected by well trained staff following industry best practices. Animal handling and sample collection from Australian animals were performed in accordance with Animal Ethics, CSIRO Brisbane Animal Ethics Committee.

Animals
The data analyzed in this study included genotypes of goats raised for milk meat and fibre production from two sources: i) a set of 976 Canadian goats from six breeds (Alpine, Boer, LaMancha, Nubian, Saanen and Toggenburg) and ii) 175 Australian goats from three breeds (Boer, Cashmere and Rangeland). The total number of genotyped animals for each breed by country is described in Table 1. The Canadian animals were from 25 commercial herds located in the provinces of Ontario and Quebec, two artificial insemination (AI) centres, and the Agriculture and Agri-Food Canada (AAFC) Centre for Animal Genetic Resources (Saskatoon, Saskatchewan). Most of the samples were ear notches (76 %), but also included extracted DNA samples from older animals (13 %), blood samples (9 %) and semen straws (2 %).
The Australian populations and the genotypes derived from them have been described previously [21]. In brief animals were sampled from three different regions: 61 Boer goats from the Yarrabee goat herd in Queensland, 66 Rangeland goats from outback New South Wales and 48 Cashmere goats from Queensland. DNA was extracted from whole blood using the Qiagen Blood and Tissue extraction kit following the manufacturer's instructions.

SNP genotyping and data filters
All the animals were genotyped using the Illumina goat SNP50 BeadChip (Illumina Inc. San Diego, CA) containing 53,347 single nucleotide polymorphisms (SNPs). SNP filtering and quality control conducted on the Australian populations resulted in analysis of a final marker set containing 52,088 loci [21]. The Canadian and Australian datasets were merged and only the 52,088 SNPs present in both datasets were kept for further analysis.
The genotyping quality control was performed within breed to remove SNPs and/or samples that could bias the LD estimates. SNPs with MAF lower than 5 % (for Alpine and Saanen breeds) or 15 % (for other breeds which have a much smaller number of genotyped animals) were removed prior to estimation of LD to prevent monomorphic loci inflating LD. SNPs were also excluded if the call rate was lower than 90 %, if they deviated significantly from Hardy-Weinberg equilibrium (HWE, p < 10 −6 ) or if they presented a heterozygosity excess (>0.15, [22]). Only mapped autosomal SNPs were included for further analyses. Missing SNP genotypes were not imputed due to the limited number of genotyped animals in each breed. Besides the SNPs quality control, we also performed a quality control to animals, where individuals that had SNP call rate < 0.90 were removed. The number of SNPs excluded during the quality control procedure by each criterion is presented in Table 1. The number of SNPs per breed remaining after exclusions ranged from 32,853 to 45,268 out of 52,088 SNPs.

Extent of linkage disequilibrium
The extent of LD between markers was measured using r 2 as proposed by Hill and Robertson [23], which is the squared correlation between alleles at two loci. It can be expressed as: , and f (b), are observed frequencies of haplotype AB and alleles A, a, B, and b, respectively. However, the number of animals genotyped for this study was not enough to reconstruct haplotypes accurately. Thus, a D estimate suggested by Lynch and Walsh [24] was used: where N is the total number of animals, and N AABB , N AABb , N AaBB , and N AaBb are the corresponding number of individuals in each genotypic category (AABB, AABb, AaBB, and AaBb). Another commonly used pair-wise measure of LD is D' [25]. The reason for using r 2 rather than D' is that r 2 is less sensitive to allele frequency and small sample size [26]. Values range from 0 (no LD) to 1 (complete LD) between two markers. If we consider the r 2 between a bi-allelic marker and an (unobserved) bi-allelic QTL, r 2 is the proportion of variation caused by the alleles at a QTL that is explained by the markers [27]. We calculated r 2 for each pair of loci on each chromosome to determine the LD between adjacent SNPs, and the LD decay over different distances. To examine the decay of LD with physical distance, SNP pairs on the autosomes were sorted into bins based on pair-wise marker distance and the average of each bin was calculated. We defined 20 distance bins: lower than 0.02 Mb, from 0.02 until 0.1 defined every 0.01 Mb from 0.1 to 1 Mb defined every 0.1 Mb from 1 to 1.2 Mb and greater than 1.2 Mb.

Consistency of gametic phase
The consistency of gametic phase was defined by the Pearson correlation of signed r values between two breeds. For each marker pair with a measure of r 2 the signed r value was determined by taking the square root of the r 2 value and assigning the appropriate sign based on the calculated disequilibrium (D) value. Data was sorted into bins based on pair-wise marker distance to determine the breakdown in the consistency of gametic phase across distances and to assess the consistency of gametic phase at the smallest distances possible, given the number of genotyped SNPs. For each distance bin, the signed r values were then correlated between all 36 pairs of breeds using the CORR procedure in SAS (SAS Institute Inc., Cary, USA).

Ancestral effective population size
The r 2 measures combined with markers distance can be used to estimate the approximate effective population size (N e ) at a given point in the past time. The N e in each generation was determined based on the expectation of r 2 in different distances and assuming a model without mutation as described by Sved [15]: E r 2 ð Þ ¼ 1 1þ4N e c , in which, c is the distance in Morgans between the SNPs. N e is the effective population size and r 2 is the average r 2 value at a given distance. Each genetic distance (c) corresponds to a value of t generations in the past. This value was calculated as t = 1/(2c) as suggested by Hayes et al. [17].
The ancestral N e was investigated at 21 time points from 5 until 1500 generations in the past. The distances (c) were taken as the middle of a range and the average r 2 value was estimated at that distance. N e was then calculated at each distance using that specific average r 2 .

Admixture analysis
In order to have an insight about the evolutionary history of the breeds included in this study we performed an admixture analysis. The same genotype quality control presented in Table 1 was applied to the admixture analysis. We used the ADMIXTURE software [28] to determine the level of admixture of each animal. This software applies a model based on a clustering algorithm that identifies subgroups that have distinctive allele frequencies. It places individuals into k predefined clusters.
The choice of an appropriate value for k is a notoriously difficult statistical problem. It seems that this choice should be guided by knowledge of a population's history [28]. In this study we evaluated k from 6 to 10 as it would be a more representative value of the expected number of subpopulations in our data set. Two out of nine populations were from the same breed (Australian and Canadian Boer populations). Furthermore, it is known that the Rangeland is a composite breed population. So only results for k = 7 were shown, which have a more reasonable biological interpretation, as suggested by Pritchard et al. [29].

Principal component analysis (PCA)
In order to better assess the breed composition of the animals and for graphically display the results we also performed a principal component analysis. Principal components were calculated from the genomic relationship matrix (G) using prcomp function of R [30]. The G matrix was calculated using the method described by VanRaden [31]: where M is a matrix of counts of the alleles "A" (with dimensions equal to the number of animals by number of SNPs), p i is the frequency of allele "A" of the i th SNP, P is a matrix (with dimensions equal to the number of animals by number of SNPs) with each row containing the p i values, I is the identity matrix (of size equal to the number of animals). Missing values in M were replaced by 2 times the frequency of allele "A" in the breed.

SNP frequency and distribution
The level of genetic diversity present within and between the goat populations can be measured by the number of polymorphic loci and their allele frequencies distributions. Table 1 indicates that the Rangeland Alpine and Saanen breeds had the highest number of loci remaining after filtering based on MAF, HWE and other metrics. Fig. 1  A descriptive summary of chromosomes and SNPs for the Alpine breed (larger sample size) is shown in Table 2. Diploid cells of Capra hircus contain 29 homologous autosomal pairs (CHI) and one pair of sex chromosomes. The total autosomal genome length was 2402.526 Mb with the shortest CHI being 41.478 Mb (CHI25) and the longest CHI being 154.929 Mb (CHI1).
After application of quality control filters to remove low quality data the 50 k SNP panel showed good coverage of the genome with an average gap size between adjacent SNP varying from 0.05 to 0.07 Mb. Additional file 1: Tables S1.a and S1.b shows the largest intervals by chromosome and breed. The largest gaps were observed on CHI12 (0.  (Canadian population), Cashmere, Saanen, LaMancha, Nubian, Rangeland and Toggenburg animals, respectively. The chromosomes that presented larger gaps in most breeds were: CHI12, CHI17 and CHI29. Most of the breeds with a smaller number of animals had the largest average gap size between adjacent SNPs, due to the exclusion of SNPs with minor allele frequency (MAF) lower than 0.15, while for Alpine and Saanen breeds, MAF threshold was 0.05. However, for the Rangeland breed, even considering a MAF threshold of 0.15, the number of excluded SNPs was similar with those from Alpine and Saanen breeds.
Additional file 2: Table S2 presents the distribution of SNPs by chromosome for each breed. The greater range in the number of SNPs/Mb was observed for the Boer breed (Australian population) from 13.62 (CHI13) to 17.31 (CHI28) SNPs/Mb and the shorter range was observed for the Alpine breed and it varied from 18.09 (CHI19) to 19.63 (CHI19) SNPs/Mb.

Extent of linkage disequilibrium within goat breeds
Linkage disequilibrium was estimated separately within each of 9 goat populations using r 2 . The average linkage disequilibrium (r 2 ) between adjacent SNPs by breed and average distance between adjacent SNPs (Mb) are presented in Table 3. Average r 2 between adjacent SNP pairs was highest within the two geographically distinct populations of Boer goats (0.287 and 0.289) and lowest for the Rangeland and Alpine populations (0.110 and 0.144). The average r 2 appears to reflect breed diversity whereby genetically diverse populations have generally lower average LD between adjacent loci. LD was also compared between chromosomes, revealing some variation (Additional file 3). The chromosomes that presented higher levels of LD were not the same for most breeds, except for Canadian and Australian Boer populations that presented more similar LD estimates. LD is expected to decline as the recombination and physical distance between the markers increases. Fig. 2 displays the average LD values at given distance ranges for each breed (see also Additional file 4). High LD values were observed only at small distances between pairs of SNPs. For all the breeds LD decays rapidly as distance between the two SNPs increases. The average r 2 estimates for the Rangeland population were the lowest values across all distances. It was followed by Alpine and Saanen. Alpine and Saanen breeds showed similar pattern of LD, which could be explained by their common ancestral origin [14].
It is important to note that the number of animals varied between groups (Table 1) and this has the potential to influence the observed LD. A correction for sampling   However, the differences were small and all the results presented in this paper are based on non-corrected r 2 estimates. Boer (Australian and Canadian populations) and Nubian had the highest levels of LD across all distances. The r 2 values for Canadian and Australian Boer animals were very similar for short distances bins except for distances up to 0.02 Mb. The r 2 similarities could be indicating that they were managed together until few generations ago (around 5 generations ago). The Australian Boer goats presented higher estimates at long distances compared to Canadian Boer goats.
Trends across distances were very similar (Fig. 2) for all breeds and the LD level decayed at a very similar rate. The extent of LD decreased substantially from the first (up to 0.02 Mb) to the second range of distances (between 0.02 and 0.03 Mb). The number of SNP pairs at distances < 0.02 Mb was quite small though, which is also indicated by the high standard deviation values (Additional file 4: Tables S4.a and S4.b). The mean r 2 decreased more slowly with increasing distance. LD levels were smaller than 0.05 at distances greater than 1.20 Mb for all breeds. The low level of long range LD may indicate that these breeds have not been under intense selection or have had large effective population size in the recent past.

Admixture and principal components analyses
Breed composition for each animal was calculated using the admixture model as described by Alexander et al. [28]. This determines the proportion of a given genome originated from each of k ancestral clusters defined as seven in this study. Fig. 7 and Table 4 show the proportion of each cluster, averaged across individuals within population. Some breeds have distinct genotypes (less clusters) based on SNP50 genotypes, such as the Boer, Cashmere and Nubian populations. The other groups share higher genome proportions with each other, indicating higher admixture and a more diverse genetic composition. The Rangeland population contains the highest rate of admixture, consistent with it being an unmanaged feral population founded by mixing of a number of breeds [21]. The admixture analysis presented here indicates contributions of mainly Cashmere, Nubian and Boer breeds into the Rangeland population. However, there is also a contribution from some dairy breeds. On average, around 13, 27 and 50 % of the Rangeland goat genome was in common to that found in Nubian, Boer and Cashmere, respectively (Table 4). LaMancha breed presents a contribution of Alpine and Nubian breeds (8 and 5 %, respectively). Saanen breed shares a higher proportion of the genome with Alpine (12 %), followed by Toggenburg (6 %) and LaMancha (4 %). The Saanen and Alpine breeds were managed together until few decades ago. However, the Saanen breed appears to be more mixed compared to Alpine breed. Australian and Canadian Boer populations were grouped together with an average of 95 % of their genome in common. Figure 8 presents the first and second principal components calculated based on the G matrix. It shows that some breeds present clear clusters while others are genetically closer to each other. Australian and Canadian Boer were clustered together. The dairy breeds were clustered apart from the dual purpose/fibre/meat breeds. The Rangeland animals were clustered close to Nubian, Cashmere and Boer, what was also observed in the admixture analysis.

Linkage phase
The strength of consistency in gametic phase between breeds has implications for the design of successful genomic prediction programs. Specifically it influences what (if any) breed combinations can be merged to form a  Table 5 presents the Pearson correlations between gametic phase of all breeds over distances smaller than 0.20 Mb (above diagonal) and between 0.02 and 0.03 Mb (below diagonal). The estimates for other distances were not presented as they were small. However, it is shown in the Additional file 6 for all distances and breed pairs. The highest consistency of gametic phase was found between Australian and Canadian Boer. This is expected given the two geographically distinct populations were drawn from the same breed. Other groups that presented higher correlations were: Alpine and Saanen, Alpine and LaMancha, Canadian Boer and Rangeland, Australian Boer and Rangeland, and Cashmere and Rangeland. The estimated correlations for all the breed pairs except Canadian and Australian Boer, were lower than 0.70 for all distances greater than 0.02 Mb. The correlations between Australian and Canadian Boer and Cashmere or Rangeland were very similar, suggesting a high relatedness between Australian and Canadian Boer populations.

Ancestral effective population size estimations
A graphical representation of the N e values at each time point from 1500 to five generations ago is given in Figs. 5 and 6. Looking at the N e in the distant past (1500 generations ago), effective populations were found to be5 325, 3309, 3057, 3030, 2742, 1967, 1803, 1743, and 1741 animals for Rangeland Alpine, Saanen, Cashmere, LaMancha, Toggenburg, Nubian, Australian Boer and Canadian Boer populations, respectively. It corresponds to the closest measured time to the goat domestication, which occurred around 10,000 years ago [32]. Based on an average generation interval of 4 years [33], they would have been domesticated around 2500 generations ago. However, there were no enough SNP pairs to accurately estimate N e for more than 1500 generations ago.
The results suggest that N e has been lower in the recent past compared to the ancient past. The effective population size at five generations ago is calculated to be Table 4 Average breed composition of 9 goat populations given 7 clusters estimated by ADMIXTURE software  The Canadian Boer population presented higher N e than Australian Boer population and it was particularly low at the most recent generations studied.

Genotypic data and levels of LD
For breeds with small number of samples higher MAF threshold was applied and therefore more SNPs were excluded (Table 1). However, for the Rangeland population, even using a 0.15 MAF threshold, it presented a number of excluded SNPs by MAF criteria similar with Alpine and Saanen breeds, indicating high levels of polymorphism in that population. The high diversity level in the Rangeland population was previously discussed by Kijas et al. [21]. From the breeds included in this study, only Alpine, Boer and Saanen were represented within the SNP discovery panel used during the development of the goat SNP50 chip. Even though, all of them presented high levels of polymorphisms. The observed levels of MAF within breeds should provide enough variability for genomic studies such as genome-wide association studies and genomic evaluations. The amount of SNPs remaining for the Alpine and Saanen breeds were slightly smaller than those attained by Carillier et al. [14] for the same breeds. They applied a call rate threshold of 98 % a MAF greater than 0.01 and Hardy-Weinberg equilibrium test (p-value < 10 −6 )  Table 5 Pearson correlations between gametic phase of all breeds for the distances pairs smaller than 0.2 Mb (above diagonal) and between 0.02 and 0.03 Mb (below diagonal)  [34] working with a crossbred population (Alpine, Saanen and Toggenburg), filtered out SNPs that were not in Hard Weinberg equilibrium, had MAF below 0.05, call rate below 0.95 or GC content below 0.6. After the quality control 47,306 markers were available for further analyses.
Despite of the number of SNPs excluded in our study, the 50 k panel showed good coverage of the genome. The number of SNPs excluded due to low SNP call rate (CR) was very similar for the Canadian breeds. The smaller number of SNPs excluded due to low CR for the Australian breeds is due to the pre quality control that was done previously in the Australian dataset in which 1145 markers with call rates lower than 90 % were removed. Greater gaps were observed between SNP pairs in some chromosomes for most of the breeds (e.g. CHI12, CHI17 and CHI29), suggesting that in future development of another SNP panel for goats more SNPs could be included in those chromosomes for better coverage. In the present study the number of genotyped animals differed considerably across breeds, with the largest number of genotyped animals in Alpine (403) and Saanen (318) breeds (Table 1). The differences in average r 2 values for the breeds may be in part due to sampling effects, the low numbers of animals genotyped in some breeds and it could be due to different effective population sizes of those populations, which seems particularly appropriate for some breeds. Bohmanova et al. [26] recommended that for Holstein cattle at least 55 animals should be used to avoid overestimation of r 2 . In this study for three populations (Cashmere, Nubian and Toggenburg) there were fewer animals genotyped than that value. To address this concern, we applied a correction suggested by Hill and Robertson [23]. However, even for the Cashmere breed (smallest sample size) the highest difference between r 2 estimated and corrected were around 0.01 units. Therefore, we decided to present the non-corrected values in the main text.
The average LD estimates in the goat breeds studied were quite variable. For Alpine and Saanen breeds average r 2 values at 50 kb were slightly smaller than the values reported by Carillier et al. [14] (0.17 at 50 kb). In a crossbred population (Alpine, Saanen, and Toggenburg) Mucha et al. [34] observed a mean r 2 at 50 kb of 0.18. For the other breeds, this was the first study done, which did not allow us to compare the results.
For the breeds Alpine Cashmere, Saanen, and Rangeland, the LD levels appear to be lower than that reported in Holstein dairy cattle (from 0.18 to 0.31, [35,9,36,10]) or pigs (0.36 to 0.46, [6,5]). The r 2 estimates for the Saanen breed were similar with those attained for the Churra breed sheep of 0.152 from 40 to 60 kb [12]. Kijas et al. [13] found average r 2 values for five sheep breeds for marker pairs at 70 kb apart varying from 0.08 to 0. 22. There is variation in the published extent of LD because the estimates of LD strongly depend on various factors such as: history and structure of the studied population (evolutionary forces that affected the population) sample size, marker type (microsatellites or SNPs), density and distribution of markers, type of method used for haplotype reconstruction, strictness of SNP filtering (threshold of minor allele frequencies and Hardy-Weinberg equilibrium), use of maternal haplotypes only or both maternal and paternal haplotypes [26].
As pointed out in Hayes et al. [17] LD at small distances reflects N e in the distant past whereas LD at large distances reflects N e in the recent past. The r 2 values for Canadian and Australian Boer animals were very similar for short distances, except for distances up to 0.02 Mb, which could be indicating that they were managed together until few generations ago. The differences observed for distances up to 0.02 Mb could be explained by the small number of SNP pairs used to estimate r 2 for that distance range. The higher r 2 estimates at long distances observed in Australian Boer goats compared to the Canadian Boer population could be due to a smaller effective population size in the more recent past in the Australian Boer population compared to the Canadian one or it could be due to the fact that all Australian Boer animals were sampled in the same region and they could be more related than the average of the Australian Boer population. The standard deviations values (SD) for the r 2 estimates at given distances (Additional file 4) were quite high, mainly for shorter distances, which may be due to the smaller number of SNP pairs available for the r 2 estimations.
The extent of LD decreased substantially from the first (up to 0.02 Mb) to the second range of distances (between 0.02 and 0.03 Mb) (Fig. 2). The low level of long range LD may indicate that these breeds have not been under intense selection or genetic drift.
Alpine and Saanen were the breeds with the largest sample sizes. The higher observed levels of LD at short ranges in some of the other breeds could be due to sampling but they are more likely to be due to smaller effective population size in those breeds, as Rangeland population also presented low r 2 values. Therefore, it would be interesting to confirm the LD results obtained in this investigation using a larger number of genotyped animals.
A higher level of LD is related to a higher accuracy of genomic estimated breeding values. Some studies (e.g. [1,37]) have recommended that an r 2 value greater than 0.2 would be sufficient for genomic selection. At the average distance between adjacent SNPs in the goat 50 k SNP panel (~0.06 Mb) the breeds LaMancha, Nubian, Toggenburg, and Australian and Canadian Boer exceeded or approached this value. This indicates that, with a large enough training population, genomic selection could potentially be implemented with reasonable accuracy using the current 50 k panel within breed, but the other breeds might benefit from a denser panel. For the Rangeland population, the LD levels were very low even for short distances, suggesting that this breed come from a highly heterogeneous population and a higher density panel might be needed to implement genomic selection in this breed.

Admixture and principal component analyses and linkage phase
The results show that a great number of animals have a significant portion of their genotype coming from another cluster (Fig. 7). Boer, Cashmere and Nubian breeds seem to have a smaller level of admixture compared to the other breeds, indicating that there is less remaining from any other ancestral breed that may have interacted with them.
For the animals that had estimates of breed composition more diverse (less than 75 % of their genes coming from a single breed) it can be assumed that a more recent admixture event could have occurred. According to Larmer et al. [38] this may be useful in identifying locations of certain QTL that are present in only one breed. If an animal has a phenotype that is significantly different from other animals in the breed to which it is registered and chunks coming from other breeds can be identified in the genome we could propose that one or more of those chunks have a QTL for that trait on them.
The higher level of admixture seen in the Saanen breed when compared to the Alpine breed implies that a greater degree of admixture has undergone since these breeds diverged historically. Consistently PCA (Fig. 8) also showed this trend. Animals from Alpine and Saanen breeds showed more spread clusters, indicating a higher breed admixture level among those breeds and other dairy breeds such as LaMancha and Toggenburg. LaMancha and Toggenburg showed clear individual clusters and a smaller genetic variation among animals from within those breeds. The larger degree of admixture seen in the Rangeland population is consistent with its evolutionary history, as the Rangeland goats are largely unmanaged feral goats. The results indicate that Boer, Cashmere and Nubian breeds are likely to have contributed to create the feral population. On average, 50 % of the Rangeland genome was in common to that found in the Cashmere breed. It indicates that the Rangeland population may have been formed by the introgression of mainly Boer and Nubian animals in the Cashmere genetics to develop the Rangeland population. PCA (Fig. 8) also confirmed this relationship, where animals from Nubian, Cashmere, Boer and Rangeland were closely clustered compared to the dairy breeds. This sharing of the gene pool may be due to mixing of the breeds as discussed before, especially for Rangeland population.
Canadian and Australian Boer populations seem to share a great proportion of the genome. The small level of admixture coming from other breeds (clusters) indicates that admixture may have taken place on average, in the distant past. The high degree of genotype sharing among both Boer populations is consistent with their evolutionary history, as the Boer breed was developed in South Africa [39] and exported to Canada and Australia a few decades ago. Furthermore, according to Casey and Van Niekerk [39], the Boer breed was formed with infusion of Indian and European blood, which could explain the admixture contribution, even small, from other breeds. The close relationship between Australian and Canadian Boer populations was also confirmed in the PCA plot (Fig. 8), where animals were clustered together based on the first two principal components.
The PCA analysis showed that the Illumina 50 K goat beadchip was able to discriminate most of the breeds. Some of them were more clearly clustered while others were clustered more closely. However this trend is consistent with the breeds history. Huson et al. [40] reported that the Illumina 50 K goat beadchip can effectively distinguish goat populations, specifically indigenous African goat populations. In a comparison of 14 African goat breeds, New Zealand Boer, three Italian Alpine breeds, and six United States of America breeds, the first principal component generated a continental categorization by Italy, United States of America, and Africa with the second principal component distinguishing the Boer breed.
The consistency of gametic phase between breeds indicates whether or not different breeds could potentially be pooled into one common training population to better estimate SNP effects. For goat genomic evaluation this would be very important due to the fact that there is a small number of genotyped animals in the breeds with small population size. The highest consistency of phase was found between Australian and Canadian Boer populations suggesting a greater level of relatedness between these populations. They may be still connected through exchange of genetic material or have diverged a few generations ago. It was also confirmed in the admixture analysis, where they were always grouped together. According to Malan [19], Boer goats were imported to North America directly from South Africa or via Australia or New Zealand, which is another evidence of their close relationship. The correlation values for them were consistent until greater distances, indicating that both populations could be pooled in a single training population. The other groups that presented higher correlations were: Alpine and Saanen, Alpine and LaMancha, Canadian Boer and Rangeland, Australian Boer and Rangeland and Cashmere and Rangeland. Based on the admixture levels observed for some breeds, it was expected a higher consistency of phase among them. However, even for those breed pairs the consistency of gametic phase between adjacent markers was not high enough to support the pooling of breeds in a training population for genomic selection. The estimated correlations for all the breed pairs, except Canadian and Australian Boer, were lower than 0.70 for distances greater than 0.02 Mb. It indicates that markers and QTL phases might not be strongly associated across those breeds.
Carillier et al. [14] found consistency of gametic phases at 50 kb (i.e. average distance between two SNPs) among French Alpine and Saanen breeds of 0.56. According to them, the two goat breeds (Alpine and Saanen) were genetically close until a couple of generations ago. In dairy cattle, de Roos et al. [41] evaluated the effect of combining multiple populations on the reliability of genomic predictions and concluded that the benefits of combining populations in a training set were higher when the populations have diverged for only a few generations ago, when the marker density was high, and when heritability was low. From the simulation studies reported by these authors, populations that had diverged six generations ago presented a correlation of phase higher than 0.8 for distances up to 0.45 Mb. Therefore, for multi-breed genomic evaluation in goats, a denser SNP panel seems to be required. For implementing genomic selection using the 50 k panel in goat breeds, other ways to increase the training population should be sought, such as genotyping more animals in each breed or collaborate with other countries and share genotypes and phenotypes/EBVs for genomic selection.

Ancestral effective population size
We observed an initial pattern of decreasing N e with values of over 1740 for Australian and Canadian Boer populations and 5325 for Rangeland population estimated in the distant past (1500 generations ago) and values closer or even smaller than 100 estimated at 5 generations ago (Figs. 5 and 6 and Additional file 7).