### Sampling and genotyping

Blood samples were obtained from a total of 520 reproductive individuals (220 bucks and 320 does), in 23 different villages located in the 3 environmental areas of Burkina Faso (8 belonging to the Sahel area; 10 to the Sudan-Sahel area; and 5 to the Sudan area; see Table 1 and Additional file 1). From these samples, 113 were previously available [10]. Within each village, from 3 to 10 different herds were sampled. In Burkina Faso, goat herds are usually small. Size of the sampled herds ranged between 15 and 30 individuals. Management of herds is not usually communal. Constant medium-range movement between districts in search of grazing areas leads to genetic exchanges between herds. When possible, sampling within a herd included the 2 older does and the younger buck to avoid close genetic relationships between individuals. Throughout the manuscript, the individuals sampled in a given village will be referred as populations. Eight populations were sampled in the present limits of the Sahel area ( Additional file 1; Table 1), 10 within the Sudan-Sahel area and 5 in the Sudan area. Seven sampled populations were located out of the limits of the Volta basin. Five populations were sampled between the Nakambé and Nazinon rivers and other 7 populations were under direct influence of the Mouhoun river. Nine populations were located on the main road of Burkina Faso which carries traffic from Dori to Ouagadougou and from Ougadogou to the second city of the country, Bobo Dioulasso.

Total DNA was isolated from blood samples following standard procedures [34]. A microsatellite set, including 19 markers (BM6526, BM757, BMS2626, BMS356, CSSM66, McM53, RBP3, BM8125, BMS2461, BMS975, CSRD2111, CSSM31, ILSTS005, INRA26, McM527, OarHH64, SPS115, TGLA53 and LSCV29) previously used in diversity analyses of goat [10] and sheep [1, 35], was analyzed on all the individuals (see Table 1). Genotyping was performed on an Automatic Sequencer ABI 310 (Applied Biosystems, Barcelona). Data deposited in the Dryad repository: http://dx.doi.org/10.5061/dryad.41h46j37.

### Statistical analyses

The following parameters were computed using the program MolKin [36] (version 3.1): expected heterozygosity (H_{e}), Wright’s F-statistics and raw (*A*) and ‘rarefacted’ (*A*
_{
(g)
}) average number of alleles per locus. Here, g was fitted to 6. Using also the program MolKin, the between-populations Reynolds’ distance (*D*
_{
R
}) and molecular coancestry (*f*
_{
ij
}). As based on a pure drift model, the *D*
_{
R
} has been shown to be an appropriate measure for livestock populations with short-term divergence [37] while molecular coancestry can be interpreted as a measure of the between-populations genetic identity [21, 38]. To avoid bias because of unequal sample sizes, statistical significance of the obtained values the bootstrapping method recommended by Simianer [39, 40] was applied using 1000 samples with exactly 23 (the average population size) individuals per sampled population. See the User’s Guide of the program MolKin (freely available at http://www.ucm.es/info/prodanim/html/JP_Web.htm) for a detailed description of the methodologies used. In any case, statistical analyses were re-run after removing the five smallest populations (4, 5, 14, 15, and 16) from the dataset to avoid any bias due to differences in sample size (Additional file 3: Figure S1). Results were highly consistent with those obtained with the full dataset. For descriptive purposes, multidimensional scaling analysis was carried out on the genetic distance matrices using the Proc MDS of SAS/STAT^{TM} (SAS Institute Inc, Cary NC).

The program Barrier version 2.2 [23] was used to identify possible genetic discontinuities in our dataset. This program uses a Delaunay triangulation to connect the geographic locations of the sampled populations on the map and then apply a Monmonier’s Maximum-difference algorithm [22], for the identification of genetic discontinuities as follows: a) by selecting the edge of the network with the largest allocated genetic distance and using it as the starting point of the barrier perpendicular to the network boundary; and b) selecting the edge which is directly connected with the growing barrier with the largest genetic distance for the continuation of the barrier. The robustness of the computed genetic discontinuities was assessed calculating 19 between-populations Reynolds’ distance matrices using jackknifing over microsatellites.

Genetic information was also summarised via computing a Principal Component Analysis (PCA) from the correlation matrix among allelic frequencies using the Proc Factor of SAS/STAT^{TM} according to the recommendations by Cavalli-Sforza et al. [41].

The relative contributions of the Sahelian and Djallonké goat to each sampled population were assessed using the programs LEADMIX [24] and [25] LEA. Parental populations were formed pooling the individuals belonging to populations from 1 to 5 for the Sahelian population and those belonging to populations 19, 21, 22 and 23 for the Djallonké breed. The program LEADMIX [24] is a maximum likelihood method that takes into account the genetic differentiation between parental populations in the admixture calculation. LEADMIX is based on a simple model where two or more parental populations diverge from an older ancestor, and then meet during an admixture event to create a third ‘hybrid’ population. In this way, the method aims to avoid falsely assuming independent allele frequency distributions of the parental populations and any resultant bias in the admixture calculation. The program LEA [25] is based on a different demographic model where the two parental populations are assumed to be at demographic equilibrium and the allele frequencies prior to admixture are sampled from independent uninformative prior probability distributions. It also accounts for genetic drift, which is estimated through the scaled parameters t1 = T/N1, t2 = T/N2, th = T/Nh, where T is the time since the admixture event (in generations), and Ni is the effective size of population I (with I = 1, 2, h). LEA implements a full-likelihood Bayesian method, and hence provides posterior distributions for the parameters of the model, rather than point estimators. Either two or three independent runs were performed for each population, using different starting values in the parameter space, to determine whether equilibrium had been reached [42]. Each run had at least 500,000 steps together with a thinning interval of five. Also, a few longer runs (up to 1 × 10^{6} steps) were used to check for convergence.

The PCA, LEADMIX and LEA scores obtained for each population were used to construct interpolation maps drawn using the Spatial Analyst Extension of ArcView, available at: http://www.esri.com/software/arcview/. The Inverse Distance Weighted (IDW) option with a power of two was selected for the interpolation of the surface. IDW assumes that each input point has a local influence that diminishes with distance. The area of sampling of each population was used as geographic coordinates, and the six nearest neighbors were used for the calculation. Interpolation surfaces were divided into seven equal classes.

Population Graph analysis [43, 44] was performed, using the program GENETICSTUDIO [45], to infer which populations may have in the past or are still experiencing gene flow. The genetic distances between populations in the network were then regressed on Euclidean distance (using a Mantel approach at the α = 0.05 significance level) to estimate isolation-by-graph-distance (IBGD) which provides an indication of two different categories of spatial genetic discontinuities. The first category consists of populations that are spatially closer than expected given their genetic covariance. In a Population Graph, the edges connecting these populations are “compressed” indicating potential locations of vicariance [26]. The second category of spatial genetic discontinuity are those populations who are spatially much further apart than expected given their genetic covariance. Here, the edges in the Population Graph are “extended” and are consistent with a scenario of long distance dispersal [26]. Data were visualised by exporting the graph topology to the freeware program GoogleEarth.

Partial Mantel test [46, 47] was carried out using the program Arlequin 3.5.1.2 [48] to check for the statistical significance of the main landscape boundaries identified in Burkina Faso. In partial Mantel test, purely spatial effects on genetic differentiation are accounted for before assessing landscape effects. Here, correlation among the between-populations Reynolds’ distance matrix and different matrices defining landscape boundaries is assessed while controlling for effects of a matrix giving information on the environmental areas of Burkina Faso (Sahel, Sudan-Sahel or Sudan) in which a given population is located. Reynolds’ distance was selected due to its nice linear behaviour [37] which fits well with the Mantel test expectations [46, 47] and its assumption of a pure drift model which fits well with poorly differentiated livestock breeds [37]. Following Balkenhol et al. [49], the presence of landscape boundaries was assessed via creating dummy matrices that indicated whether a population pair was separated by a boundary or not (noted in the matrix as 1 or 0, respectively; Table S2). Also following Balkenhol et al. [49], a matrix characterising environmental areas was created using a dummy variable that placed a population in a certain area not bisected by landscape boundaries (i.e. the population received a “1” in the column belonging to that area, and a “0” in all other columns; Additional file 4: Table S2). The statistical significance of the following boundaries were assessed: *i*) the present limit of the Sahel area; *ii*) the present limit of the Sudan area; *iii*) the present Northern tsetse limit in Burkina Faso [11]; and *iv*) like boundary *iii* but including in the tsetse free area those populations sampled on the road from Ouagadougou to Bobo Dioulasso (13, 17 and 18). These four analyses are expected to characterise if differentiation between the Sahelian, Mossi and Djallonké goat has genetic support (matrices *i* and *ii*), the genetic differentiation due to the presence or absence of trypanosome vectors (matrix *iii*) and the combined effect of the presence of trypanosome vectors and the human action (matrix *iv*). In all cases, p-values were obtained using 10,000 permutations. Statistical significance level of the four hypothesis tested on our dataset was assessed at α = 0.05 after Bonferroni correction.