Mitochondrial DNA haplogroup H structure in North Africa

Background The Strait of Gibraltar separating the Iberian Peninsula from North Africa is thought to be a stronger barrier to gene flow for male than for female lineages. However, the recent subdivision of the haplogroup H at mitochondrial DNA (mtDNA) level has revealed greater genetic differentiation among geographic regions than previously detected. The dissection of the mtDNA haplogroup H in North Africa, and its comparison with the Iberian Peninsula and Near-East profiles would help clarify the relative affinities among these regions. Results Like the Iberian Peninsula, the dominant mtDNA haplogroup H subgroups in North Africa are H1 (42%) and H3 (13%). The similarity between these regions is stronger in the North-West edge affecting mainly Moroccan Arabs, West Saharans and Mauritanians, and decreases eastwards probably due to gene flow from Near East as attested for the higher frequencies of H4, H5, H7, H8 and H11 subgroups. Moroccan Berbers show stronger affinities with Tunisian and Tunisian Berbers than with Moroccan Arabs. Coalescence ages for H1 (11 ± 2 ky) and H3 (11 ± 4 ky) in North Africa point to the possibility of a late Palaeolithic settlement for these lineages similar to those found for other mtDNA haplogroups. Total and partial mtDNA genomic sequencing unveiled stronger mtDNA differentiation among regions than previously found using HVSI mtDNA based analysis. Conclusion The subdivision of the mtDNA haplogroup H in North Africa has confirmed that the genetic differentiation found among Western and Eastern populations is mainly due to geographical rather than cultural barriers. It also shows that the historical Arabian role on the region had more a cultural than a demic effect. Whole mtDNA sequencing of identical H haplotypes based on HVSI and RFLP information has unveiled additional mtDNA differences between North African and Iberian Peninsula lineages, pointing to an older mtDNA genetic flow between regions than previously thought. Based on this new information, it seems that the Strait of Gibraltar barrier affected both male and female gene flow in a similar fashion.


Background
Bounded by the Mediterranean Sea in the North and the Sahara desert in the South, North Africa behaved as an anthropological island in the African Continent. Previously, only the Strait of Gibraltar in the West and the Suez Isthmus in the East connected this region to Europe and the Near East respectively. These two passageways were the most probable migratory routes followed by human intercontinental dispersals since prehistoric times. In addition, periodic wetter climatic conditions allowed contacts with sub-Saharan African peoples across the Sahara desert and seafaring achievement brought numerous Mediterranean and Atlantic cultures to the North African shores [1]. Archaeological information points to a modern human occupation of this area since 45,000 years ago (ya), as attested by the Aterian industry [2]. However, there is no unanimous assent about the degree of human continuity since that time, as some of the posterior Palaeolithic industries (Iberomarusian, Dabban) exhibit no clear cultural connections with the earlier Aterian form. Furthermore, there is also controversy about the demic impact of the Near East Neolithic on the Northwest African autochthonous Capsian Neolithic [1]. Even more tenuous are the putative connections of the Aterian with the Solutrean of Iberia, or those of the Capsian with the Mediterranean Neolithic [3]. Nonetheless, it is agreed that the historic penetration in the area of the Pharaonic and classical Mediterranean cultures, ending with the Islamic domination, imposed strong cultural influences with only a minor demic impact [4,5]. Population genetic studies using classical markers pointed to a sizeable Upper Paleolithic component in Northwest African populations [6], whereas the Neolithic diffusion in that region was more a cultural than a demic process [7]. More recently, the haploid characteristics of the uniparental genetic markers allowed the successful application of phylogenetic and phylogeographic approaches to population genetics. Thus, mitochondrial DNA (mtDNA) phylogeographic analyses have enhanced the power of this maternal nonrecombining marker to detect human migrations on continental [8][9][10] and regional [11][12][13][14] scales. Focusing on North Africa, several mtDNA studies have shown that, in spite of an important Sub-Saharan African contribution, the majority of the lineages detected in this region belong to, or have common roots with, Eurasian haplogroups [15][16][17][18][19][20][21][22][23]. Some of these haplogroups, including the X1 [12], U6 [11,13] and M1 [13,14], although of West Asian origin, have Paleolithic coalescence ages in North Africa. Others seem to be of more recent acquisition as a result of European (U5, V [24][25][26]) or Middle Eastern influences (R0a, J1b, U3 [17,[27][28][29]). In agreement with classical markers and mtDNA, in an early analysis of Northwest African populations using paternal Y-chromosome variation, it was proposed that the main haplogroups defined by the M78 and M81 binary markers could be the paternal counterparts of the classical and maternal Paleolithic components [30]. However, more recent studies in which those and other markers were further subdivided suggested a predominantly Neolithic origin for the Y-chromosomal DNA variation in North Africa [20,[22][23][24][25][26][27][28][29][30][31][32][33][34].
The discrepancies in the uniparental marker results could be due to real differences in male and female demographic histories [35]. However, a lack of mtDNA haplogroup resolution could also be responsible. Recently, the accurate dissection of the most frequent Western Eurasian haplogroup H into several monophyletic subhaplogroups [36,10][37-40] changed a rather uniform genetic landscape into one with several regional peaks and clinal variations. Thus, the frequencies of H1 and H3 subhaplogroups are highest in Western Europe, decreasing gradually to the East. In contrast, H2 occurs more frequently in Eastern than in Western Europe. Besides, there are subhaplogroups that characterize other regions, such as H6 and H8 in Central Asia, H13 in the Near East, H20 and H21 in the Caucasus, and H18 in the Arabian Peninsula. Haplogroup H is also the most frequent clade in North Africa. Global frequencies are highest in the Northwest, representing 37% and 34% of their mtDNA lineages in Berber and Arab speaking Moroccans [15], 24% [41] and 32% [19] in Berber Algerian samples and 26% in Tunisian Berbers [20]. The frequencies drop slightly southwards, showing 24% in Saharans [15] and 23% in Mauritanians [42], as well as eastwards displaying 21% [16] and 14% [21] in Egyptian samples. The aims of this paper are: 1) to subdivide the North African haplogroup H lineages into its known subhaplogroups, 2) to establish the phylogenetic and phylogeographic patterns of these subhaplogroups in the region, and 3) to compare them with those present in Europe and the Near East, in order to establish the strength of the human migrations from both continents into North Africa in spatial and temporal dimensions.

Results
As described previously, total frequencies for the haplogroup H decline toward both the East and the South ( Table 1). The haplogroup H represents 44% of the mtDNA variation in the Iberian Peninsula, but only 22% in the Near East. Likewise, this distribution still reaches 25% in North Africa, but drops to only 9% in the Arabian Peninsula. Haplogroup H subclade distribution is also very different in the various regions. Subhaplogroups H1 and H3 are the dominant subgroups in the Iberian Peninsula (45% and 16%, respectively) and North Africa (42% and 13%, respectively) whereas unclassified H haplotypes (H*) account for 40-50% of the H diversity in the Arabian Peninsula and the Near East. Furthermore, while H1 (12%) is still the most frequent subgroup, followed by the H5 (8%) in the Near East, the modal subclades in the Ara-   The relative affinities among regions are based on subhaplogroup frequencies, which do not take into account differences between haplotypes assorted in the same subgroup, or in haplotypic matches, whose identity is based only on partial HVSI sequences. In addition, it has to be taken into account that half of the H lineages detected in North Africa are not shared with other regions and that this percentage is even greater in the putative source regions of the Near East (70%) and the Iberian Peninsula (76%). These facts point to a higher differentiation among regions and between populations than those observed previously. Indeed, complete or nearly complete sequencing of some apparently identical samples indi- cates that the real genetic heterogeneity among regions is greater than those estimated above ( Figure 2). To begin with, the HVSI motif 16093 -16189 that characterizes subgroup H1f was found in an individual (Mor 2047) from Morocco ( Figure 2) also in an H1 background. This subgroup is particularly abundant and mainly restricted to Finland and the surrounding populations [36]. At first sight, this coincidence would seem to point to a new link between North European with North African populations like that found previously for U5b1b [26]. However, in this case, further analysis of the coding region in the North African sample revealed a lack of the three coding region mutations that additionally characterize the Finish H1f subgroup [38] (Figure 2). This lack of identity between haplotypes assorted in the same subgroup and sharing the same or similar HVSI motif can be extended to other cases. For instance, there is a group of H sequences that shares the 16145 -16222 HVSI motif consistently found in Northwestern Africa, the Sahara and several Western Sahelian populations [15]. The complete sequencing of a Mauritanian sample (Mau 2027) allowed the assignation of this type to the subhaplogroup H1 (Figure 2). The direct connection of this motif with a German sequence was previously suggested [15]. However, the additional presence of transitions 16304 and 456 in the HVSI and HVSII regions respectively in that German haplotype [43] indicated that it should be classified as belonging to the H5 instead of the H1 subgroup, which does not support a direct link between these regions. In contrast, the two 16145 -16222 haplotypes sporadically detected in the Iberian Peninsula [ [44] and unpublished results] belonged to the North African subgroup as they shared the coding 10257 mutation, in addition to the H1 diagnostic transition 3010, with the totally sequenced Mauritanian sample (Figure 2). It seems that the 10257 transition defines a new subgroup within H1. This fact points to a possible, although not recent, North African demic influence on the Iberian genetic pool. Another interesting group of sequences belonging to the H1 subgroup in North Africa is that characterized by the 16172 -16311 motif, which we [15] and others [19] have found mainly in Saharan samples. Haplotypes with, or including, this HVSI motif have also been detected in European [45,43,8][ [46][47][48][49] and in Asian [50][51][52][53][54] samples, but not in the Iberian Peninsula yet (see Additional file 1). However, the possibility of direct phylogenetic links among such distant regions is very weak, because all of those individuals further classified in both regions belong to the H5 subgroup or the HV haplogroup [48,49] in Europe, or to the HV or the R2 haplogroups [53,54] in the Middle East, which strongly points to yet another case of HVSI convergence in distinct backgrounds of coding regions. In addition to the CRS, the 16189 and the 16311 HVSI motifs are quite abundant in North Africa (see Additional file 1). However, when these samples were screened for the coding region positions observed in completely sequenced European or Middle East individuals that held the same HVSI motifs (Figure 2), none of these positions appeared in the North African samples. This lack of homogeneity again strongly points to their different monophyletic coding backgrounds, in spite of their HVSI matches, a fact repeatedly found in other studies [38]. Indeed, in this study, there are also instances of molecular convergence in the coding region. Sequences How 73H and Jor 843 share the 12236 transition, although they respectively belong to the H* and H5 subgropups ( Figure 2). The 12358 transition also presents one such case that is shared by four sequences (Her 127, Ach 28, MM H2, and Mau 2027) belonging to different H subgroups (Figure 2).

Discussion
The dissection of mtDNA haplogroup H in North Africa has confirmed several genetic features of its populations. First, there is a significant genetic differentiation between Graphical relationships among the studied populations Figure 1 Graphical relationships among the studied populations. Codes are as in Table 1. MDS plots based on F ST haplogroup (a) and haplotypic (b) frequency distances.
Northwestern, Central and Eastern populations already detected since the first genetic studies carried out in North Africa using classical genetic polymorphisms [6]. This differentiation has also been found by posterior molecular analyses using Y-chromosome markers [31][32][33][34] or X-chromosome SNPs [55]. Second, as Arab and Berber communities are present in both areas, geographic isolation, more than cultural barriers, seems to be the main cause of this genetic differentiation. This has been consistently reported in all previous studies using autosomal short tandem repeats [4], autosomal Alu insertion polymorphisms [56,57], high-resolution Y-chromosome analyses [30,58,59], and mtDNA polymorphisms [19,20,22,23]. As a consequence, it has been proposed that the North African gene pool has had Palaeolithic and Neolithic influences from the East, but that the impact of the historical invasions, such as the Arabic role, had more a cultural than a demic effect. The lack of exclusive haplotypic matches between North Africa and the Arabian Peninsula found here is in accordance with that hypothesis. Third, the southward clinal diminution of haplogroup H frequencies found at mitochondrial level is well explained as a counteracting effect of the northward clinal diminution of the Sub-Saharan maternal gene flow [15,5,19]. Fourth, the genetic heterogeneity detected between the North African and the Iberian Peninsula populations has been attributed to both the effect of the physical barrier imposed by the Strait of Gibraltar and strong cultural differences. However, some gene flow has been detected between areas and its strength depends mainly on the type of marker used. The strongest barrier effect has been detected in analyses based on Y-chromosome polymorphisms [30]. The levels of gene flow detected in autosomal studies have been of more diverse range [4,56] and, in some cases, seem to depend on the population samples used as is the case with, for instance, the CD4/Alu microsatellite haplotypes [60,61]. In contrast, a high female permeability has been deduced from several mitochondrial studies that pointed to the existence of an important maternal Iberian input on North Africa [15,19]. Although there is no archaeological evidence to justify such a demic flow from Iberia to North Africa, based on the phylogeographic range, comparative gene diversity and ages of several mitochondrial haplogroups such as V, H1, H3, and U5b1b [25,37,26], the presence of these haplogroups in North Africa is thought to be the result of a southward expansion of Palaeolithic hunter-gatherers from the Franco-Cantabrian refuge after the Last Glacial Maximum. In fact, coalescence ages for H1 and H3 subclades estimated in this study are in good agreement with those previously published and are congruent with these expansions. Thus, our HVSI based coalescence ages for H1 (14.2 ± 3.0 ky) and H3 (10.3 ± 2.6 ky), in the Iberian Peninsula, are very close to those published by Pereira et al. [40] in the same area for H1 (14.0 ± 3.0 ky) and for all of Europe for H3 (11.0 ± 3.0 ky). Furthermore, striking similarities are observed when these ages are compared to those obtained from the coding region in similar geographic ranges, using the Mishmar et al. calibration [62]. Thus, H1 coalescence ages for Iberia (13.0 ± 6.0 ky; [40]) and Southwest Europe (12.8 ± 2.4 ky; [37]) are very similar between themselves, and not significantly different from those based on the HVSI. Likewise, H3 coding region based coalescence ages for whole Europe (9.0 ± 3.0 ky; [40]) and Southwest Europe (10.3 ± 2.4 ky; [37]) are also very similar to those based only on the HVSI. That Palaeolithic expansion would explain the notorious presence of H1 and H3 detected mainly in the most North-western populations of North Africa and the decrease in their frequency eastwards. However, if this hypothesis held, the comparatively high diversity of H1 and H3 in North Africa would point to an important Palaeolithic gene flow from the Iberian Peninsula to North Africa across the Strait of Gibraltar. On the contrary, a consensus exists regarding the Near East origin of the bulk of the Y-chromosome and mtDNA North African lineages. However, discrepancies still exists with respect to the time in which these settlements most probably occurred. In the first Y-chromosome pioneering studies of the region, a Palaeolithic settlement for the autochthonous E-M81 clade was hypothesized in accordance with the age proposed based on classical markers [30]. However, later studies have assigned this, and other subclades derived from E-M78, that are particularly abundant in North Africa, a Neolithic or even historic settlement age and a Near East or Northeast African source [63,[31][32][33][34]. On the other hand, for those mtDNA haplogroups pre-eminent in North Africa, that have been analyzed at deep genomic and phylogeographic levels, such as U6 and M1, a Palaeolithic settlement and Middle East roots have been proposed [11,13,14]. From our data, it can be also deduced that the presence of the H1 and H3 subgroups in North Africa could have similar expansion times as in Europe and, therefore, a late Palaeolithic settlement in the region. Finally, it should be noted that the different levels of gene flow detected throughout the Strait of Gibraltar, with respect to Y-chromosome and mtDNA polymorphisms have been attributed to sexual migratory differences, with females showing more permeability than males due to patrilocality and polygyny [5,60,19], and to genetic drift differently affecting both sexes [22,59]. However, the first explanation is not in accordance with the demographic flows known to have occurred between Morocco and Iberia across the Strait of Gibraltar. Historically, the main human movement from Northwest Africa to the Iberian Peninsula was the Islamic Invasion. As a military enterprise, it is believed that this North African gene flow into Iberia was mainly a male contribution. If genetically important, it would homogenize the male lineages between Iberia and North Africa to a greater extent than the female lineages, in contradiction to the experimental results. Little is known about prehistoric contacts between these two areas, but human movements repeatedly crossing the Gibraltar Strait to establish patrilocality seems improbable. The lack of deep sequence identity for several mtDNA haplotypes assorted in the same H subgroup and considered haplotypic matches between North Africa and the Iberian Peninsula, clearly points to the existence of a higher mtDNA heterogeneity between these two regions than suggested in previous studies. If the greater level of differentiation established for H in the present study were extendable to other mitochondrial haplogroups, the female levels of gene flow between both areas would match approximately those of males. Further mtDNA studies at genomic level are necessary to test this hypothesis.

Conclusion
The subdivision of mtDNA haplogroup H in North Africa has confirmed that the genetic differentiation found among Western and Eastern populations is mainly due to geographical rather than cultural barriers. It also appears that the historical Arabian role on the region had more a cultural than a demic effect. Whole mtDNA sequencing of apparently identical H haplotypes, based on HVSI and RFLP information, has unveiled additional mtDNA differences between North Africa and the Iberian Peninsula, pointing to the Strait of Gibraltar barrier as affecting male and female gene flow in a similar fashion. . Analyzed individuals that could not been assorted into any of the known groups were considered as H* types. In addition, complete or nearly com-plete mtDNA sequencing was carried out on 6 individuals with haplotypes found only in well-defined geographic areas or with HVSI haplotypic matches between very distant regions with the aim of accessing whether these matches also held for their coding regions (Figure 2, [64,65,36,9,10,66,62,37,67,49]). Furthermore, in order to find out additional subdivisions, those individuals presenting HVSI matches with already published complete haplogroup H sequences were screened for all coding region positions they hold (Figure 2). DNA extraction, primers, conditions used for PCR amplifications and total or partial sequencing have been published previously [9,29]. RFLP analyses and subhaplogroup H nomenclature (see Additional file 3) were as in Loogväli et al. [38] and Roostalu et al. [68]. Haplogroup and haplotype diversities (h) as well as molecular genetic diversities (π) were calculated according to Nei et al. [69]. Only HVSI positions from 16,024 to 16,365 were used for genetic comparisons of partial sequences with other published data. Phylogenetic relationships among HVSI and genomic mtDNA sequences were established using the reduced median network algorithm [70]. Ages of clades were estimated using the rho statistic [71], and a calibration of 1 transition within np 16090-16365 corresponds to 20,180 years [72] for HVSI sequences.

Methods
For population comparisons, F ST distances were calculated based on haplogroup and haplotype frequencies using Arlequin 2.0 [73]. In order to diminish the strong influence of the common haplotypes in F ST distances, an additional measure of haplotypic identity [I HT = (HT XY / (HT X ·HT Y )] was used, where HT XY is the number of shared haplotypes between populations X and Y, and HT X and HT Y are the numbers of different haplotypes in the populations X and Y, respectively. Multidimensional scaling (MDS) plots were obtained from F ST distances and principal component analysis (PCA) from haplogroup frequencies using SPSS version 13.0 (SPSS Inc., Chicago, Illinois).