Saudi Arabian Y-Chromosome diversity and its relationship with nearby regions

Background Human origins and migration models proposing the Horn of Africa as a prehistoric exit route to Asia have stimulated molecular genetic studies in the region using uniparental loci. However, from a Y-chromosome perspective, Saudi Arabia, the largest country of the region, has not yet been surveyed. To address this gap, a sample of 157 Saudi males was analyzed at high resolution using 67 Y-chromosome binary markers. In addition, haplotypic diversity for its most prominent J1-M267 lineage was estimated using a set of 17 Y-specific STR loci. Results Saudi Arabia differentiates from other Arabian Peninsula countries by a higher presence of J2-M172 lineages. It is significantly different from Yemen mainly due to a comparative reduction of sub-Saharan Africa E1-M123 and Levantine J1-M267 male lineages. Around 14% of the Saudi Arabia Y-chromosome pool is typical of African biogeographic ancestry, 17% arrived to the area from the East across Iran, while the remainder 69% could be considered of direct or indirect Levantine ascription. Interestingly, basal E-M96* (n = 2) and J-M304* (n = 3) lineages have been detected, for the first time, in the Arabian Peninsula. Coalescence time for the most prominent J1-M267 haplogroup in Saudi Arabia (11.6 ± 1.9 ky) is similar to that obtained previously for Yemen (11.3 ± 2) but significantly older that those estimated for Qatar (7.3 ± 1.8) and UAE (6.8 ± 1.5). Conclusion The Y-chromosome genetic structure of the Arabian Peninsula seems to be mainly modulated by geography. The data confirm that this area has mainly been a recipient of gene flow from its African and Asian surrounding areas, probably mainly since the last Glacial maximum onwards. Although rare deep rooting lineages for Y-chromosome haplogroups E and J have been detected, the presence of more basal clades supportive of the southern exit route of modern humans to Eurasian, were not found.


Background
The arid Arabian Peninsula can be viewed as a geographic cul de sac and passive recipient of Near East cultural and demic expansions since the Bronze Age. However, southern Arabian late Neolithic excavations revealing sorghum and date palm cultivation attest to earlier influences from East Africa and South western Asia [1]. However, after the southern dispersal route of modern humans across the Bab el Mandeb Strait was proposed [2] and further developed [3], multidisciplinary interest on this region has dramatically increased in the search for traces of such putative early southern dispersals across the Arabian threshold. From an archaeological perspective, Lower Paleolithic Oldowan industries found in the Arabian Peninsula indicated that, presumably, H. erectus or H. ergaster may have been making forays into this region [4]. There is also evidence of Middle Paleolithic Mousterian and Aterian technologies in Arabia suggesting the possibility of an expanded southern border for H. neanderthalensis and potential links with populations from Northern Africa [5]. More recent field work has sampled new Palaeolithic sites in Oman, under covering lithic assemblages with typological affinities to industries in the Levant, India and the Horn of Africa, suggesting there were a series of huntergathers range expansions into southern Arabia from all three refugia over the last quarter of a million years [6]. However, there are few reliable age estimates for these industries. Furthermore, the passage of these hominids to Arabia could be either through an overland route or across a waterway.
From a population genetics perspective, the most recent studies on the Arabian Peninsula have been carried out using mainly uniparental, mitochondrial DNA (mtDNA) and Y-chromosome, markers. Analysis based on maternal lineages in the region has been interpreted to represent traces of late Palaeolithic, Neolithic and more recent inputs from nearby regions into Arabia as well as signatures of authochtonous expansions [7][8][9][10][11][12]. However, the lack of deep rooting M and N sequences in the contemporary Arabian mtDNA pool leaves the proposed southern coastal route without empirical support. Although Y-chromosome basal lineages remain undetected, phylogeographic patterns indicate that the Levant was an important bidirectional corridor of human migrations [13,14]. Moreover, the Levant appeared as the main source of male lineages to the Arabian Peninsula [15]. However, Saudi Arabia, a country that occupies about 80% of the Arabian Peninsula, was not directly included. In order to fill this void, we performed a high resolution Y-chromosome SNP analysis of 157 Saudi Arabian males and a STR-based analysis of J1-M267, the most frequent Y-chromosome haplogroup in Saudi Arabia. Comparisons to other published Arabian Peninsular populations and nearby regions were made to further explore paternal traces of the modern human transit across Arabia.

Sample and typing
All sample collecting and genotyping tasks were performed by the Saudi Arabian collaborators of this study.
Buccal swabs or peripheral blood were obtained from 157 paternally unrelated Saudi Arab males whose all known paternal lineages, at least for two generations, were of Saudi Arabia origin. Due to its moderate size we have not performed a regional subdivision of the sample. Informed consent was obtained from all the participants. This research followed the tenets of the Declaration of Helsinki. DNA was extracted using the Nucleon™ BACC Genomic DNA Extraction Kit (GE healthcare, Piscataway, NJ, USA). Sixty-seven Y-Chromosome binary genetic markers were genotyped hierarchically. Primers, polymorphic positions and haplogroup nomenclature were as recently actualized [16]. After amplification and purification, haplogroup typing was carried out in all cases by direct sequencing on the ABI 3130 xI Genetic Analyzer (Applied Biosystems, Foster city, CA, USA). The following 17 Y-STR loci: DYS19, DYS385 a/b, DYS389I/II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, DYS439, DYS448, DYS456, DYS458, DYS635 and Y-GATA H4 were amplified in a Gene Am PCR System 2700 (Applied Biosystems) using the AmpF/STR Yfiler Amplification Kit (Applied Biosystems) following the manufacturers instructions. DNA fragment separation was carried out in an ABI Prism 3100 Genetic Analyzer (Applied Biosystems). STR alleles were identified by comparison to a commercial allelic ladder using Genotyper 3.7 NT software.

Statistical analysis
To make comparisons reliable, haplogroup frequencies were normalized to the same phylogenetic level as the sample in the published studies with the most unrefined haplogroup resolution. Analysis of molecular variance (AMOVA) and haplogroup frequency pairwise F ST genetic distances [24] were performed using the ARLEQUIN 2000 package [25]. Principal component (PC) analysis and two-dimensional graphics were carried out using the SPSS statistical package 11.5 (SPSS, Inc). Times to most recent common ancestor (TMRCA) for the J1-M267 clade were calculated from STR variances and by coalescence methods (see Additional File 1). Mean STR variance was estimated as proposed by Kayser and others [26] and transformed in divergence time using a mean STR mutation rate of 0.00069 per generation of 25 years [27]. For coalescence age estimations STR loci were weighted by their respective variances as described in Qamar and others [28]. Then, median-joining networks were constructed after processing the data with the reduced median network method using Network version 4.5.1.0 [29]. The rho-statistic was estimated with Network and converted into time using the above mentioned Zhivotovsky and others [27] mutation rate. For these estimations loci DYS385 were excluded and the repeats of DYS389I were subtracted to the DYS389II so that its diversity was not considered twice.

Y-SNP analysis
Only 27 of the 67 binary markers analyzed were informative in defining the haplogroup census of the Saudi Arabian sample ( Table 1) (Table 1). When the AMOVA analysis compared all regions as unitary groups except the Arabian Peninsula, it showed a partition of variance within populations (91%) and among groups (6.8%) to be highly significant (p < 0.0001 in both cases) but an analysis among the samples within the Arabian Peninsula (2.2%) did not reach statistical significance (p = 0.10 ± 0.01). The same trend is also reflected in F ST based pairwise distances (Table 2), as all the distances not involving pairs of Arabian samples were statistically highly significant (p < 0.0001) but those within the Arabian Peninsula showed lower levels of significance, except when affecting Yemen (p < 0.0001) that is the most divergent population of Arabia, although showing its least distance to Qatar (p = 0.048). Precisely, in addition to Yemen, Saudi Arabia has significant haplogroup frequency differences only with Qatar (p = 0.009). Quantitatively, the main peculiarity of the Saudi male pool with respect to the rest of the Arabian samples is a significant higher presence of the J2-M172 related haplotypes (p = 0.001). On the other hand, the high divergence of Yemen is mainly due to a significant excess of J1-M267 (p < 0.0001) types, a nearly significant excess of J2-M67 types (p = 0.062), and to a lack of R1-M17 representatives (p = 0.04). In addition, southern Arabia, represented by Yemen and Oman, show a greater E1-M123 account than in northern areas (p = 0.006). As the Y-chromosome phylogeography is well structured [30,31], it is possible to roughly quantify the different male inputs from surrounding regions into Arabia. When haplogroups A, B, E-M96, E1-P2, E1-M2, and E1-M35 frequencies are assumed to be representative of the Africa contribution and frequencies of C, F, G, H, L, O, Q and R1-M17 as arrivals, across Iran, from Central, southern, and southeastern Asia, inputs of 13.4% and 16.6% from both areas are estimated for Saudi Arabia. UAE is the region with the least African (5.5%) and the greatest Iranian influence (36.8%). If the global inputs for the Arabian Peninsula are estimated to approximate 10% from sub-Saharan Africa and 22% from Iran, Bidimensional plots based on Y-chromosome haplogroup frequency data in the Arabian Peninsula and adjacent popula-tions   Figure  1a) and as the two first components of a PC analysis (Figure 1b). In the first analysis, congruently with its pair-wise distances, Somalia appears as an outlier, and the close relationship found between Qatar and Yemen is also reflected in the plot. components only capture the 26% and the 17% of the variance respectively and haplogroup E1-M78 has a minor contribution in the first and second component matrix, Somalia, that has the highest E1-M78 frequency of all the samples, does not behave as an outlier in the bidimensional PC representation. Finally, it deserves mention that, qualitatively, Saudi Arabia is also peculiar because of the presence, in low frequencies, of two underived E-M96* (1.3%) and J-M304* (1.9%) lineages that were not detected in other surveys of the Arabian Peninsula [14,15].

Y-STR analysis
Y-chromosome STR diversity was used to obtain age estimates in Saudi Arabia for the most frequent J1-M267 haplogroup. In order to compare these estimations with other Arabian peninsula regions, we re-calculated the J1-M267 ages in UAE, Qatar and Yemen using the STR data already published for these countries [15]. STR haplotypes found in Saudi Arabia are listed in Additional file 1. Divergence times obtained using mean variance and coalescence methods are presented in Table 3. Values obtained using coalescence resulted always larger than those based on variance.

Discussion
Consistent with the rest of the Arabian Peninsula, haplogroup J is the most abundant component in Saudi Arabia embracing 58% of its Y-chromosomes. Its two main sub-   [21,34,35].
In a similar way, it is possible that J2-M47 signals a more recent expansion from the Levant that also affected the Arabian Peninsula. The peculiar distribution of J2-M67 in Arabia could be explained assuming maritime contacts from classical Mediterranean cultures. The presence in Saudi Arabia of three males harbouring underived J1-M304 chromosomes is intriguing. It could be that they came together with the J1-M267 or J2-M172 expansive waves, or they could represent the remnants of an old and geographically widespread Palaeolithic substrate. This type of underived chromosomes has been detected rarely in Turkey [21], in Oman and in the eastern Mediterranean area [34]. However, as the critical Levantine region has not yet been adequately dissected for J1, it seems premature to favor any of these hypotheses. The geographic pattern and most probable origin of the Y-chromosome haplogroup J in Arabia faithfully mirrors those found for the most prevalent J and R0a mtDNA haplogroups in the same region [7][8][9]12]. In addition, J1-M267 divergence age calculated for Saudi Arabia (11.6 ± 2 kya) and Yemen (11.3 ± 2 kya) are also very coincidental with those calculated for J1b (11.1 ± 8.4 kya) and R0a1 (9.6 ± 2.9 kya) in Saudi Arabia [7,8]. It is worth mentioning that J1-M267 ages in Saudi Arabia and Yemen are significantly older than those obtained for UAE and Qatar (Table 3) [15] and for Oman [14] pointing to a terrestrial more than to a maritime colonization. It has been suggested that Yemen could be a center of expansion for mtDNA haplogroup R0a [9].
Although the comparatively high divergence calculated for J1-M267 Y-chromosomes in Yemen [15] could be in support of primary or, at least, secondary human expansions from southern Arabia, it could also be explained as result of successive arrivals of J1 chromosomes from different source regions. Furthermore, J 1-M267 diversities found here for Saudi Arabia are of the same range than in Yemen and both smaller than the one estimated for Tur-   [21]. This southwards decreasing trend is more compatible with a Neolithic arrival to Arabia via the Levant as proposed by others [15]. In fact, the coalescence ages calculated in Saudi Arabia for their most prominent mtDNA and Y-chromosome lineages merged around 10 kya. This period is coincidental with an improvement of the climatic conditions in the area that facilitated the spread of Natufian and Neolithic cultures from the Fertile Crescent north and southwards. The haplogroup E1-M123/M34 has an extended but sparse geographic distribution in Eastern Africa, the Middle East and the Mediterranean basin [14,32,35]. However, its frequency rises considerably in some populations, most probable because of isolation and genetic drift effects. This would explain frequencies as high as 11% found in some Ethiopian samples [35] or the highest (31%) found, until now, in the Dead Sea region of Jordan (Flores et al. 2005). In the Arabian Peninsula this clade is, in general, well represented but reaches significant higher frequencies in the southern countries of Yemen and Oman compared with northern areas. An East African origin, and posterior spread to the Near East through the Levantine corridor, of this lineage was proposed based on the much larger variance of this clade in Egypt (0.5) versus Oman (0.14) [14]. On the contrary, other authors have suggested that E1-M123 may have originated in the Near East because of its generalized implantation there [21] compared to its presence in Eastern Africa, mainly just localized in Ethiopia [32]. Recent E-M123 haplogroup variances calculated for Yemen (0.14) and UAE (0.25) were also lower than the Egyptian one [15]. In addition, haplotypic differences found for those two Arabian countries indicated that they do not share a common ancestry [15]. So, from an Arabian Peninsula perspective, E1-M123 could have come from Ethiopia, across the Horn of Africa, or from the Levant, or even from both sources, forming independent isolates. Global male inputs from Sub-Saharan Africa and Asia across Iran, not the Levant, into the Arabian Peninsula have been estimated in this study, as 13.4% and 16.6% from both source areas respectively. Recent mtDNA studies on the same Arabian Peninsula countries [7][8][9]12] have confirmed a notable female-driven sub-Saharan African input with a mean value around 15% for all the Peninsula, although frequencies as high as 60% have been detected in Hadramawt populations of Yemen [9]. Curiously, the Iranian female flow (18%) was also rather similar to that calculated for Africa. Although a slight ratio excess of Sub-Saharan African female versus male gene flow is detected (1.12) we do not found the strong sexual bias proposed by other authors for Arabian populations and attributed to the peculiarities of the recent slave-trade [12,36]. Without dismissing the role mediated by slavery, the geographical distribution of these sub-Saharan African lineages in the Arabian Peninsula seems to indicate a prehistoric entrance of a noticeable portion of these lineages that participated in the building of the primitive Arabian population [8,9].  [37]. Again, this hypothesis has its mtDNA counterpart as it is well documented that, in the Palaeolithic, at least three clades (X1, U6, M1) derived respectively from the three main Eurasian macrohaplogroups (N, R, M) came back to North Africa from Asia [38][39][40][41][42].
Within this frame, it should be expected that E-M96* types appear in Africa although its presence in the Arabian Peninsula instead Eastern Africa would not compromise the last proposed model. It could be suggested that these E-M96 Saudi lineages have a sub-Saharan Africa ancestry. However, at least for one of them, all their known male ancestors belong to a big Shammar Arab tribe that ruled much of central and northern Arabia from Riyadh to the frontiers of Syria and northern Iraq. In addition, it might be present in Lebanon [18]. However, as the authors did not type more markers derived within the E-M96 background such as P147, P177, P2, P75 or M329, more comprehensive phylogenetic resolution of YAP derived Ychromosomes in the Middle East and North and East Africa are necessary to explore the topic further. In any case, the presence of E-M96* in Saudi Arabia should not be taken as support of the southern exit of modern humans across the Srait of Bab el Mandab as no D nor CF underived lineages have been yet found in this area.

Conclusion
The Y-chromosome genetic structure of the Arabian Peninsula seems to be mainly modulated by geography. The data confirm that this area has mainly been a recipi-ent of gene flow from its African and Asian surrounding areas, probably mainly since the last Glacial maximum onwards. Although rare deep rooting lineages for Y chromosome haplogroups E and J have been detected, the presence of more basal clades supportive of the southern exit route of modern humans to Eurasian, were not found.