Culture creates genetic structure in the Caucasus: Autosomal, mitochondrial, and Y-chromosomal variation in Daghestan

Background Near the junction of three major continents, the Caucasus region has been an important thoroughfare for human migration. While the Caucasus Mountains have diverted human traffic to the few lowland regions that provide a gateway from north to south between the Caspian and Black Seas, highland populations have been isolated by their remote geographic location and their practice of patrilocal endogamy. We investigate how these cultural and historical differences between highland and lowland populations have affected patterns of genetic diversity. We test 1) whether the highland practice of patrilocal endogamy has generated sex-specific population relationships, and 2) whether the history of migration and military conquest associated with the lowland populations has left Central Asian genes in the Caucasus, by comparing genetic diversity and pairwise population relationships between Daghestani populations and reference populations throughout Europe and Asia for autosomal, mitochondrial, and Y-chromosomal markers. Results We found that the highland Daghestani populations had contrasting histories for the mitochondrial DNA and Y-chromosome data sets. Y-chromosomal haplogroup diversity was reduced among highland Daghestani populations when compared to other populations and to highland Daghestani mitochondrial DNA haplogroup diversity. Lowland Daghestani populations showed Turkish and Central Asian affinities for both mitochondrial and Y-chromosomal data sets. Autosomal population histories are strongly correlated to the pattern observed for the mitochondrial DNA data set, while the correlation between the mitochondrial DNA and Y-chromosome distance matrices was weak and not significant. Conclusion The reduced Y-chromosomal diversity exhibited by highland Daghestani populations is consistent with genetic drift caused by patrilocal endogamy. Mitochondrial and Y-chromosomal phylogeographic comparisons indicate a common Near Eastern origin of highland populations. Lowland Daghestani populations show varying influence from Near Eastern and Central Asian populations.


Background
The populations of the Caucasus region have complex histories of isolation and gene flow. The region as a whole has served as a gateway between continents, with waves of human migration leaving rich cultural and linguistic diversity in their wake [1,2]. The Caucasus Mountains have shaped the routes of migrating populations and military invasions, diverting these travellers away from the remote highlands and into the more easily accessible lowlands. Differences between highland and lowland populations are exaggerated by the marriage practices of highland populations: wives move to the home of their husbands, while husbands remain in the land of their forefathers for generations [3,4].
We have identified five populations from Daghestan that have been influenced by both physical and cultural barriers to gene flow. Three are highland isolates, while two lowland populations represent admixed groups influenced by Turkic and Mongolian migrants. We investigate whether the geographic barrier of altitude, the cultural barrier of patrilocal endogamy, or the introduction of migrants from a great distance have left detectable patterns in the genetic diversity of these populations. Specifically, we ask 1) whether geographic isolation and patrilocal endogamy have caused more genetic drift in highland than lowland populations, and 2) whether lowland populations show evidence of admixture from Turkic and Mongolian migrants.
We test these hypotheses by comparing mitochondrial DNA (mtDNA) and Y-chromosome (non-recombining, NRY) haplogroup frequencies among Daghestani, Near Eastern, Central Asian, Central/Northern European, and East Asian populations, as well as autosomal variation in 100 polymorphic Alu insertions among Daghestani, Central/Northern European, and East Asian populations. We compare measures of haplogroup diversity and pairwise distance for the mtDNA and NRY markers, then compare these to genetic distances from the autosomal data, looking for evidence of genetic drift and shared origins. Our results demonstrate that the cultural practice of patrilocality and historic population movements have shaped genetic variation in these Caucasus populations.

Populations
The Avar, Dargin, and Kubachi populations sampled in our study live in the highlands of Daghestan (Figure 1). They each speak different languages belonging to the Northeast Caucasian language family. The Avars and Dargins have traditionally been agriculturalists and pastoralists, while the Kubachi have specialized in jewelry making [5]. These highland populations are isolated due to their remote location, their linguistic variation, and their prac-tice of strict patrilocal endogamy [3,4]. This marriage practice controls the inheritance of property and restricts male gene flow [3]. These populations are thought to be indigenous to the region but, like other native peoples of the Caucasus region, their exact origins are unclear [6]. Previous genetic studies have revealed that populations within the Caucasus region do not share the genetic variation believed by some researchers to be a signature of the Neolithic expansion through Europe, leading some to infer that these populations are remnants of a more ancient Eurasian population [3,7,8]. Others suggest that the Caucasus region is instead inhabited by a collection of peoples who represent those who have travelled through or invaded the region in the historic past [4,9,10].
The Nogai and Kumiks of the Daghestani lowlands have a history of admixture. These populations speak different languages belonging to the Kipchak division of the Turkic language family and are more exogamous than the highland populations [11,12]. The Kumik population represents a mixture of native peoples of the Caucasus with Turkic migrants from the 4 th to the 15 th centuries and may be descended from the Kipchaks, an ancient Turkic population [10,12]. The Kumiks currently practice a flexible form of virilocality and frequently exchange mates with other villages. The Nogai are descended from the Nogai khanate of the Mongol empire, established in the 12 th century, which arrived in Daghestan in the 13 th and 14 th centuries [5,11]. Although administrated by Mongolians, this khanate was peopled by many native Caucasian ethnic groups [11,12], suggesting that the Nogai are an admixed population. The Nogai practice dual exogamy, prohibiting marriage within one's kin group or patronymic, a practice that encourages gene flow.
Overall, the five ethnic groups sampled for our study are representative of the genetic variation to be found in Daghestan, as these groups represent approximately onehalf of the country's population [3]. DNA samples were obtained following voluntary consent procedures developed and approved by the Daghestan IRB at the Institute of History, Archeology and Ethnology of Daghestan Center, of the Russian Academy of Sciences.
We are unable to include these Near Eastern and Central Asian population in our analyses of autosomal Alu inser-tion variation, as comparable data have not been published. We therefore include Central/Northern European and East Asian populations in our study of mtDNA, NRY, and Alu insertion variation. Our Central/Northern European sample includes 15 French and 57 Utahns of Central/Northern European ancestry, while the East Asian population includes 9 Cambodians, 13 Han Chinese, 16 Japanese, 5 Malay, 3 Taiwanese, and 9 Vietnamese individuals. These populations are geographically and genetically more distant from the Daghestani populations than the Near Eastern and Central Asian populations. As such, they act as extreme values in pairwise distance calculations and principal components analyses. These populations place the observed differences between Daghestani and other populations into a broader perspective. Distances between Daghestani populations and the Central/ Northern European and East Asian populations serve as a yardstick against which distances within Daghestan, and between Daghestani populations and populations from the Near East and Central Asia may be measured.

Y-chromosome Haplogroup Variation
In a similar fashion, we have genotyped the males of the above sample ( [2,15]. We follow the phylogeny and nomenclature established by the Y Chromosome Consortium [21], with the combination of SNPs used to define each haplogroup listed in additional file 2: NRYdefinitions.xls.

Autosomal Alu Insertion Variation
Genotypes for 100 Alu insertions have been previously published [22] for the Dargin, Kubachi, Kumik, Nogai, Central/Northern European and East Asian individuals genotyped for the mtDNA SNPs.

Analytical Methods
Haplogroup diversity within Daghestani populations was calculated using Nei's h statistic [23]. The pairwise genetic distances between populations were calculated using population pairwise F ST . Genetic distances were tested for significant differences from zero with 10,000 permutations in a randomization test (Arlequin 2.0; [24]). Principal components analyses were performed using MATLAB R2007b in order to visualize the genetic relationships between populations. Mantel tests, using 10,000 random permutations in Arlequin 2.0, were performed to assess the correlations among the NRY, mtDNA, and Alu distance matrices.

Haplogroup Frequencies and Diversity
Mitochondrial haplogroup frequencies and diversity values are presented in Table 1. The Daghestani populations resemble our samples from Central/Northern Europe and Turkey in their frequencies of common haplogroups HV, T, and U5. The highland Kubachi are an exception, with more than 75% of the sample possessing subtypes of haplogroup U, most commonly observed in Central/Northern Europe and Russia [25]. The autosomal pairwise genetic distances calculated from the 100 Alu loci (Table 4) are similar to those observed for the mtDNA data but not the NRY data. Mantel tests of correlation between the mtDNA, NRY, and autosomal distance matrices (Table 5) demonstrate a strong and significant correlation between the mtDNA and Alu distances, a weaker correlation between the NRY and Alu distances, and no significant correlation between the NRY and mtDNA distance matrices. Figure 2 presents the principal components analysis results for the mtDNA haplogroup frequency data. The first principal component, explaining 41% of the variance, identifies the Central/Northern European and East Asian populations as the extremes. Populations fall along the first principal components axis in a rough west-to-east gradient, with the exception of the Daghestani populations. The highland Kubachi are isolated by the second principal component, explaining 23% of the variance. This is likely due to genetic drift leading to the observed

Discussion
Cultural restrictions on marriage practices have proved to be powerful barriers to gene flow, as evidenced by the historical caste system in India [26] and patrilocal marriage practices throughout the world [27][28][29]. Because highland Daghestani populations practice patrilocal endogamy, we would expect that they would exhibit reduced genetic diversity and larger genetic distances when compared to other populations with respect to the NRY but not mtDNA. Our observations are consistent with these predictions. We see a reduction of genetic diversity in the NRY among highland populations compared to that in other populations and to their own mtDNA diversity. This pattern of reduction is not observed for the lowland Daghestani populations.
We see no significant correlation between the mtDNA and NRY distance matrices. Y chromosomes of highland Daghestanis appear to have undergone genetic drift independent of the population history of mitochondrial and autosomal loci, suggesting that restrictions on marriage, and not geographic isolation, are the causal agents for NRY drift among highland populations. This is consistent with previous research [30], which found that the mountain range had not acted as a significant barrier to gene flow between populations located north and south of the Caucasus Mountains.
The highland Kubachi form an endogamous isolate of nearly 3,000 individuals [5], restricting both male and female gene flow. Our results indicate genome-wide signs of genetic drift for this population. For each set of markers, the Kubachi population has high pairwise distance values to lowland Daghestani populations as well as to reference populations. These patterns of genetic isolation are reflected in the mtDNA and autosomal principal com-   The Nogai population is defined by its relationship to the administration of the Mongol Empire in the Caucasus, when Mongolians ruled over the native peoples of the Caucasus region [11,12]. We might expect that this historical event would have mainly introduced Central Asian males to the native populations of the Caucasus, and would have subsequently introduced more Central Asian Y chromosomes than mtDNA variants to the Caucasian gene pool [36]. If instead, the Nogai represent a population of both males and females that migrated to the Caucasus region, we would expect the mtDNA and NRY histories to be similar to each other. In fact, this is the case for the Kalmyks, a Russian population who migrated from Mongolia approximately 300 years ago [37].

Conclusion
This study describes the effects of culture, geography, and gene flow on genetic diversity among Daghestani populations. Highland populations show reduced Y-chromosome diversity, but autosomal and mtDNA variation is not reduced, reflecting the effects of a patrilocal mating system. Mitochondrial and Y-chromosomal phylogeographic inferences suggest a Near Eastern or Caucasusregion origin of the highland populations. Our results, including haplogroup sharing, genetic diversity, and patterns of population pairwise distance lead us to confirm that the lowland Kumik and Nogai populations have been influenced by gene flow from local and migrant Central Asian populations, as suggested by history. Overall, our results demonstrate that population history, geographic isolation, and patrilocality have all left detectable signatures on the genetic landscape of the Caucasus.

Authors' contributions
EEM designed the study, genotyped the Daghestani individuals, performed the analysis, and drafted the manuscript. WSW designed the multiplex technology used to genotype the mitochondrial and Y-chromosome haplogroups and genotyped the Central/Northern European and East Asian individuals. KB collected the DNA samples from Daghestani populations. HCH and LBJ participated in study design, supervision, and revision of the manuscript.