Haplotype frequencies at the DRD2 locus in populations of the East European Plain

Background It was demonstrated previously that the three-locus RFLP haplotype, TaqI B-TaqI D-TaqI A (B-D-A), at the DRD2 locus constitutes a powerful genetic marker and probably reflects the most ancient dispersal of anatomically modern humans. Results We investigated TaqI B, BclI, MboI, TaqI D, and TaqI A RFLPs in 17 contemporary populations of the East European Plain and Siberia. Most of these populations belong to the Indo-European or Uralic language families. We identified three common haplotypes, which occurred in more than 90% of chromosomes investigated. The frequencies of the haplotypes differed according to linguistic and geographical affiliation. Conclusion Populations in the northwestern (Byelorussians from Mjadel'), northern (Russians from Mezen' and Oshevensk), and eastern (Russians from Puchezh) parts of the East European Plain had relatively high frequencies of haplotype B2-D2-A2, which may reflect admixture with Uralic-speaking populations that inhabited all of these regions in the Early Middle Ages.


Background
The DRD2 gene is located on chromosome 11 and encodes the neuronal dopamine receptor D2, which plays a role in movement, emotional memory, and appetitive behavior [1]. The DRD2 locus was an object of numerous genetic association studies [2][3][4][5], and the most extensively studied polymorphism is a TaqI A RFLP (rs1800497; in the vicinity of the DRD2 gene), which has been associated with the pathology of psychoses (schizophrenia and manic-depressive disorder), Parkinson's disease, and vari-ous substance abuse syndromes. It has been proposed that TaqI A might be in linkage disequilibrium with some unidentified polymorphisms within the exons or regulatory regions of the DRD2 gene, but recently it has been mapped to the last exon of the ANKK1 (ankyrin repeat and kinase domain containing 1) gene, and it results in a Glu-Lys substitution [6]. Other frequently studied RFLPs, for example, TaqI B and D (rs1079597 and rs1800498, respectively) are located in the introns of the DRD2 gene and, most probably, have no functional significance.
TaqI B, TaqI D, and TaqI A polymorphisms have also been studied on a worldwide scale [ [7][8][9][10][11]; the ALFRED database, http://alfred.med.yale.edu/alfred/index.asp], and centers of dispersal, which probably reflect the most ancient dispersal of anatomically modern humans, have been proposed for their three-locus haplotypes [7]. It has been shown that the B2, D2, and A1 alleles are ancestral alleles common to other hominoids [12][13][14]. Kidd et al. [7] proposed the following evolutionary sequence for the most common haplotypes: evolution of B2-D2-A1 to B2-D2-A2 and B1-D2-A1 and evolution of B2-D2-A2 to B2-D1-A2. The other less frequent haplotypes probably arose by recombination. Three-locus haplotypes exhibit pronounced geographical differentiation. With the exception of some tribal populations of India [11], the ancestral haplotype B2-D2-A1 is mainly confined to African groups. The singly derived haplotype B1-D2-A1 is most widespread among people of East Asian descent, including Native Americans. Haplotype B2-D2-A2 is present among all populations but is least prevalent in Western European and American populations. The doubly derived haplotype B2-D1-A2 is common in Europe and rare in East Asia. The other haplotypes are extremely rare and have sporadic distribution.
Here we provide data on the variability of DRD2 haplotypes in a previously uninvestigated region, the East European Plain. We investigated 14 contemporary populations of the East European Plain and Siberia that belong to the Indo-European and Uralic language families. In addition, two populations of the Altaic language family (Yakuts and Kalmyks) and a population of the North Caucasian language family (Adygeis) were included as reference groups. We also performed an updated global analysis of B-D-A haplotype frequencies using our data and data from the ALFRED database.

Results
We studied five RFLP loci, TaqI B, BclI, MboI, TaqI D, and TaqI A. The locations of these polymorphisms are shown on the gene map (Figure 1), and allele frequencies in the populations studied are presented in Table 1. Allele 1 (the restriction site was absent) at the TaqI B locus was always present with allele 1 at the BclI locus; allele 2 (the restriction site was present) at the former locus was always found with allele 2 at the latter locus. The same tight linkage was observed for the MboI and TaqI D loci, and there were no exceptions among the 3198 chromosomes studied. In most populations, all loci exhibited Hardy-Weinberg equilibrium (assessed by the exact test using a Markov chain, P > 0.05). Significant deviations from Hardy-Weinberg equilibrium were observed only for the MboI and TaqI D loci in Khants (P = 0.044) and the TaqI A locus in Yakuts (P = 0.023).
Alleles 1 at the TaqI B, BclI, and TaqI A loci were more frequent in the populations of Russians 1 and 5, Veps, Komi 2, Khants, Nenets, Yakuts, and especially Kalmyks, than in the other populations (Table 1). Allele 1 at the MboI locus and allele 2 at the TaqI D locus were the most frequent alleles in Asian populations, i.e. Khants, Nenets, Yakuts, and Kalmyks. It is notable that combinations of allele 2 at the TaqI B locus with allele 1 at the BclI locus and allele 2 at the MboI locus with allele 2 at the TaqI D locus, found in the sequenced chimpanzee genome [15], were absent in human populations (Table 1).
Pairwise linkage disequilibrium (D') was strong in most cases (Table 2). However, disequilibrium values were lower for the MboI-TaqI A and TaqI D-TaqI A pairs in some populations, which is similar to the findings of Kidd et al. [7]. In Yakuts, disequilibrium was very low, except for the TaqI B-TaqI A and BclI-TaqI A pairs. Only two of the SNPs studied have been involved in the international HapMap project http://www.hapmap.org: TaqI A (rs1800497) and TaqI B (rs1079597). Linkage disequilibrium between these SNPs is high and significant in all 11 populations included in the HapMap project phase 3. The MboI and TaqI D loci are located between SNPs rs2587548 and rs2734836 investigated in the HapMap project. Linkage disequilibrium between SNP rs2587548 and the TaqI A locus is low in the Utah population of European ancestry and high but nonsignificant in the Chinese and Japanese populations (HapMap project phase 2), which is in agreement with our results.
Taking into consideration perfect linkage in the TaqI B-BclI and MboI-TaqI D pairs, the BclI and MboI RFLPs were redundant for inter-population comparison, and we focused on the distribution of three-locus haplotypes TaqI B-TaqI D-TaqI A. Their frequencies are shown in Table 3. Only three haplotypes were common in the populations studied: B2-D1-A2, B2-D2-A2, and B1-D2-A1 (haplotypes termed according to [7]). The other haplotypes were extremely rare. Haplotype frequencies were used for calculation of F ST -based genetic distances. The resulting distance matrix was visualized using multidimensional scaling (MDS) (Figure 2). Two large population groups can be distinguished: Asian (Khants, Nenets, Kalmyks, and Yakuts) and European (the other populations). According to UPGA cluster analysis, the European and Asian groups might be further subdivided into subclusters European-1 and European-2, Khants-Nenets group, Kalmyks, and Yakuts.
All populations of the European cluster had significant genetic distances from the populations of the Asian cluster. The European-1 and European-2 clusters were quite homogeneous; most distances were non-significant within the former, and all were non-significant within the latter. Only Russians 1 occupied distinct position within the European-1 cluster. In contrast, most pairwise genetic distances between the two subclusters were significant. Exceptions included Veps, which had no significant distances from the populations of the European-2 cluster, and Byelorussians 2, which, had a small number of significant distances from the populations of the European-1 cluster.
The matrix of haplotype-based genetic distances was also compared with matrices of great circle geographical distances (data not shown) to assess the possible effect of isolation by distance. The matrices including all 17 populations were highly correlated according to the Man-tel test (r = 0.749, P-value = 0.0001). However, correlation of geographical and genetic distances was not observed among the populations of the European cluster, i.e., after exclusion of Khants, Nenets, Kalmyks (which have migrated from East Asia in historical times), and Yakuts (r = 0.121, P-value = 0.2999). The use of more realistic distance measurements around geographical barriers, such as the Azov Sea, the Caucasus, and some parts of the Ural Mountains had little effect on results of the test (the correlation coefficients for 17 and 13 populations were 0.754 and 0.117, respectively; P-values were 0.0001 and 0.3109, respectively). Thus, isolation by distance is not a likely cause of the genetic variation observed in the East European Plain.
We also compared our results with those of previous authors to examine population relationships in greater detail. To do this, we analyzed haplotype frequencies in the populations studied and in 38 populations from various continents (data from [7,9], and the ALFRED database) using MDS of F ST -based genetic distances ( Figure 3 and Additional file 1). A good fit between the two-dimen-SNP accession numbers and corresponding RFLPs names are presented for the polymorphisms studied. Alleles without restriction sites are named "allele 1", alleles with restriction sites "allele 2". a only A and G were detected at this SNP locus in a sample of 2450 chromosomes from 17 populations http://www.hapmap.org, http:// www.ncbi.nlm.nih.gov/SNP b only T and C were detected at this SNP locus in a sample of 1204 chromosomes from 14 populations http://www.ncbi.nlm.nih.gov/SNP c only A and T were detected at this SNP locus in a sample of 142 chromosomes from three populations http://www.ncbi.nlm.nih.gov/SNP d only T and C were detected at this SNP locus in a sample of 946 chromosomes from 12 populations http://www.ncbi.nlm.nih.gov/SNP e only T and C were detected at this SNP locus in a sample of 2628 chromosomes from 21 populations http://www.hapmap.org, http:// www.ncbi.nlm.nih.gov/SNP f population initially described in [62] g population initially described in [38] h population initially described in [37] i population initially described in [53] j population initially described in [39]

Geographical and linguistic affiliations of populations sampled in this study, and allele frequencies in the studied populations
Map of polymorphisms in the DRD2 and ANKK1 genes Figure 1 Map of polymorphisms in the DRD2 and ANKK1 genes. Restriction polymorphisms, corresponding SNPs, and distances between them are shown. Exons are indicated by gray vertical bars.   sional plot and the source data was obtained, demonstrated by the low stress value (0.062). Cluster analysis by the UPGA algorithm was performed using the same distance matrix and enabled us to define four large population clusters with two subclusters each, i.e., eight population groups in total. All clustering methods that we used (complete linkage, unweighted and weighted pairgroup average, unweighted and weighted pair-group centroid, and Ward's method) revealed identical clusters at the level of eight groups. However at 'higher' and 'lower' clustering levels, some methods gave different results. To identify haplotypes responsible for the observed intercluster differences, haplotype frequencies for various clusters were compared using the Kolmogorov-Smirnov test ( Table 4).  to the Kolmogorov-Smirnov test < 0.0001 and 0.001, respectively). This is also evident from Figure 3: on the MDS plot, both European subclusters occupy almost identical positions in the B1-D2-A1 frequency gradient ( Figure  3, upward arrow) but have different positions in the B2-D1-A2 and B2-D2-A2 frequency gradients (Figure 3, arrows). A low but highly significant level of population differentiation was observed between the European-1 and -2 subclusters (F CT 0.018, P < 0.00001, Table 5); 59% of genetic distances between populations of the two subclusters were significant, but only 14% and 22% of the intracluster distances were significant. Other close pairs of subclusters such as Intermediate-1 and -2 demonstrated higher levels of differentiation (F CT 0.027-0.045, Table 5). About 60% of intercluster genetic distances were significant in all instances. The global F ST value, 0.11413 (Table 5), falls within the range typical of autosomal markers (0.09-0.14, [16]). This value was estimated using haplotype frequencies without taking into account the extent of molecular differences between haplotypes. Calculation of F ST based on numbers of pairwise differences between haplotypes gave a nearly identical value, 0.11383. Thus, the differentiation of populations may be explained by drift only, without any significant influence of mutation.

Discussion
The East European (Russian) Plain is a region in which peoples of the Indo-European and Uralic language families have come into contact over an extended period. Uralic-speaking peoples have the longest validated archaeological record in this region [17]. The most recent large-scale migration to this region involved the movement of Slavs (the Indo-European language family) to the east and northeast of their presumed homeland in Central Europe about 500 AD [18,19]. Slavs were not the first Indo-European-speaking people who arrived in the Russian Plain: in the first millennium BC, Baltic-speaking tribes occupied a large part of the East European Plain [17]. They were later displaced by Slavic tribes. According to the widely accepted hybridization theory of the origin of Eastern Slavs [20], Slavic populations arriving in the East European Plain were mixed with indigenous Uralicand, probably, Baltic-speaking people.
In our study, all populations of the East European Plain (excluding the Kalmyks, which are of East Asian origin) fell into a single large cluster termed European. Many populations within this cluster are indistinguishable with our genetic marker, i.e., genetic distances between them were not significant, which is in agreement with the low F ST value for the European cluster (0.013). However, some populations were characterized by a large percentage of significant genetic distances from the other populations of the cluster. Most such populations fell into the so-called European-2 subcluster defined by cluster analysis; the 'core' subcluster was termed European-1, and 59% of genetic distances between populations of the two subclusters were significant. European-1 and European-2 subclusters ( Figure 3) are differentiated according to the B2-D2-A2 frequency, but not according to the B1-D2-A1 frequency, which might reflect the degree of Asian admixture. Natural selection probably was not responsible for separation of the two European subclusters as there is no difference in allele frequencies at the TaqI A locus (Table  4), which is considered the most likely candidate for selection in the whole DRD2 region [2][3][4][5][6].
The European-2 subcluster includes two Middle Eastern populations (Jews 2 from Yemen and Druze from Israel), two Uralic-speaking populations (Finns and Komi 3), also four Slavic-speaking populations (Byelorussians 2 and Russians 4, 5, and 6), and the Altaic-speaking Chuvash. All these linguistically and geographically distant populations are differentiated to some extent from the core of the European cluster, the European-1 subcluster, because of a relatively high B2-D2-A2 frequency.
The B2-D1-A2 and B1-D2-A1 haplotypes apparently have centers of dispersal in Europe/West Asia and East Asia, respectively [7]. The B2-D2-A2 haplotype may also have a center of dispersal, the most probable location of which is in Africa. B2-D2-A2 was among the first haplotypes that evolved from the ancestral haplotype B2-D2-A1 in Africa [7] and still is the most abundant haplotype in all African populations (Additional file 1). Therefore, the first settlers of Eurasia that migrated to Arabia and Levant may mostly have carried the B2-D2-A2 haplotype and a small proportion of other haplotypes that were either subsequently eliminated or amplified by genetic drift and/or natural selection in various parts of the world.
Russians 5 (Oshevensk) were most closely associated with Finns and Chuvash according to the MDS results ( Figure  3). In the study of Verbenko et al. [38] on polymorphic tandem repeats at the D1S80 locus, the same Oshevensk sample (as well as another northern Russian sample) clustered together with Uralic-speaking Mari, Komi, and Udmurts, whereas other Russian populations clustered with Indo-Europeans and Adygeis. Analysis of other repeat loci, 3' ApoB, DMPK, DRPLA, and SCA1, also demonstrated remoteness of some northern Russian populations (including Russians 5, Oshevensk) from the core of the European cluster [38]. Similar results were obtained using haplotypes at the TP53 locus [39]: the Oshevensk population tended to form a cluster with Uralic-speaking Mordvins and with Altaic-speaking Kalmyks and Buryats, but not with Russians from Smolensk (Russians 2) or Byelorussians from Pinsk (Byelorussians 1).
According to archaeological data, the Arkhangelsk region (including Mezen' and Oshevensk) was populated by Uralic tribes in the Middle Ages ( [48]; see Figure 4). Russian colonization of this region began relatively recently (after the 12th century AD) [48]. Thus, a very high level of Uralic admixture in Mezen' and Oshevensk is not surprising.
The Uralic genetic substratum is appreciable not only in the Northeast of the East European Plain but also in its northwestern part, for example, in the Pskov [33] and Novgorod regions [38], in Latvians and Lithuanians [46,49,50]. Baltic-speaking peoples, now represented by the Latvians and Lithuanians, came into contact with Uralic groups before the Slavs did (Figure 4; [18]). That Byelorussians 2 (Mjadel') fell into the European-2 subcluster may also reflect a general tendency in the northwestern region. Mjadel' is located in the northwestern part of Belarus near the contemporary Lithuanian border. The Russians 4 (Puchezh) population is distant from the northeastern and northwestern groups, but also belongs to the European-2 subcluster (Figure 3). Uralic admixture in this population may be explained by the presence of Uralic-speaking tribes in the region of Puchezh in historical times ( [51]; see Figure 4).
Russians 1 (Andreapol') and Uralic-speaking Veps are close to Uralic-speaking Komi 2 (Obyachevo) on the MDS plot ( Figure 3). All these populations are located within the region occupied by Uralic peoples in the Middle Ages ( Figure 4), but belong to the European-1 cluster and do not have high B2-D2-A2 frequencies typical of the European-2 cluster. However, they are shifted from the core of the European cluster because of a relatively high proportion of the "East Asian" haplotype B1-D2-A1. In fact, the Veps population has significant genetic distance only from Druze, but not from the other populations of the European-2 cluster, and Komi 2 only from Jews 2, Druze, Russians 6, and Komi 3. The Andreapol' sample had the highest B1-D2-A1 frequency of all European populations (Additional file 1), and eight of 13 genetic distances between this sample and the other populations of the European-1 subcluster are significant.
Komi populations demonstrate remarkable heterogeneity according to various marker systems. For example, Komi-Permyaks and Komi-Zyrians have rather different mtDNA haplogroup frequencies but both have a relatively high U4 frequency [40]. In our study, one of the Komi-Zyrian populations (Komi 1, Izhma) belonged to the core of the European-1 cluster. It is interesting that the craniological results of Moiseyev [52] also place Komi-Zyrians at the core of the European cluster and distant from Uralic and Asian groups. According to three-site haplotype frequencies at the TP53 locus and VNTR frequencies at the D1S80 and 3' ApoB loci, the Komi 1 and 2 populations are distant from Uralic-speaking Finns, Mordvins and Khants, East Asian groups, and Slavic groups [53]. Moreover, the Komi 2 (Obyachevo) population is distant from Komi 1 and closer to Slavic groups than Komi 1 [53]. Thus, the position of Komi in genetic gradients remains uncertain because of substantial divergence of population samples and contradictory results, which may reflect a complex history of this group or natural selection.  The dashed line indicates the region that was presumably occupied by Uralic-speaking peoples in the Early Neolithic era, the 5 th millennium BC (according to [17]). Solid lines indicate three regions occupied by different ethnic groups in the 1 st -6 th centuries AD: Slavic-and Germanic-speaking tribes, densely dotted area; Baltic-speaking tribes, hatched area; Uralic-speaking tribes, sparsely dotted area [48,61].
East Asian-1 (Figure 3), although they are clearly of East Asian origin. As suggested by archaeological and linguistic evidence, the Yakuts probably migrated north from their original area of settlement near Lake Baykal because of the Mongol expansion from the 13th to 15th century AD [54]. Y-chromosome results reveal a very strong bottleneck in the Yakut population, which probably preceded their recent expansion [46,54]. This bottleneck effect may be responsible for the aberrant haplotype frequencies for Yakuts observed in our study.

Conclusion
Populations in the northwestern (Byelorussians 2 from Mjadel'), northern (Russians 5 from Mezen' and 6 from Oshevensk; Komi 3), and eastern parts (Russians 4 from Puchezh and Chuvash) of the East European Plain have relatively high frequencies of haplotype B2-D2-A2, which may reflect admixture with Uralic-speaking populations. Uralic genetic substratum in these regions, which were inhabited by Uralic-speaking tribes as late as the Early Middle Ages, was also shown by studies in which other genetic markers were used (mtDNA, Y-chromosome, and autosomal). Thus, the analysis of DRD2 haplotypes supports results on Slavic-Uralic admixture obtained using other markers, mainly neutral and sex-specific markers.

Populations
The linguistic affiliations and geographical locations of the studied populations are shown in Table 1 and Figure  5. These populations have been described previously (see footnotes to Table 1 Geographical location of the study populations on the map of Russia and the neighboring countries Figure 5 Geographical location of the study populations on the map of Russia and the neighboring countries.

DNA isolation and typing
Blood samples (8 ml) were obtained by venipuncture and collected into EDTA-coated containers. Informed consent was obtained from each individual. The research protocols and forms of informed consent have been approved by the Ethic Commission of the Medico-Genetic Scientific Centre of the Russian Academy of Medical Sciences (an approval was signed by the Head of the Ethic Commission, PhD, professor L.F. Kurilo). All individuals belonged to the native ethnic group of the region, i.e., their lineage in the region extended for at least two previous generations, and were unrelated to each other. DNA was isolated from leucocytes by proteinase K treatment and extraction with phenol-chloroform [55]. Each DNA sample was subjected to three PCR analyses: amplification of a 459 bp fragment for TaqI B and BclI RFLP analysis, amplification of a 300 bp fragment for MboI and TaqI D RFLP analysis, and amplification of a 237 bp fragment for TaqI A RFLP analysis (primer sequences and original PCR protocols were obtained from the website of K. Kidd: http:// info.med.yale.edu/genetics/kkidd/ Protocol_TOC_new2002.html). The locations of these polymorphisms are shown on the gene map ( Figure 1). All endonuclease restriction reactions were carried out overnight. Samples containing unrestricted fragments were tested at least twice. Restriction products were separated by electrophoresis using a 2.5% agarose gel.

Statistical analyses
The general strategy of statistical analysis was similar to that used in the work of Poloni et al. [56]. Allele frequencies, correspondence to Hardy-Weinberg equilibrium, and significance of linkage disequilibrium (P-values) were evaluated using Arlequin version 2.0 software http:// cmpg.unibe.ch/software/arlequin. Linkage disequilibrium values (D') between polymorphic loci were calculated as suggested by Lewontin [57]. Frequencies of haplotypes were estimated from RFLP genotype data using the expectation-maximization algorithm of Excoffier and Slatkin [58] implemented in Arlequin 2.0. Genetic affinities among populations were evaluated using coancestry coefficients, or linearized pairwise F ST values [59] calculated on the basis of allele or haplotype frequencies.
Significance of genetic distances was tested using permutations [60].
Correlation of geographical and genetic distances was assessed using the Mantel test and XLSTAT version 2008.6.04 software (Addinsoft). Great circle geographical distances were calculated using the haversine formula; latitudes and longitudes were determined using Google Earth software. Multidimensional scaling (MDS) and cluster analysis with the unweighted pair-group average (UPGA) method were performed using STATISTICA version 6.0 http://www.statsoft.com. Statistical comparison of haplotype frequencies in population groups was performed using the Kolmogorov-Smirnov test implemented in XLSTAT. Differentiation between population groups defined by MDS and cluster analysis was estimated using the analysis-of-molecular-variance approach, AMOVA [60], implemented in Arlequin 2.0. Conventional F ST distances between haplotypes were used, i.e., all haplotypes were considered equidistant (a conservative scenario of pure drift). Significance of the genetic-structure indexes obtained with the AMOVA method was tested using a permutational procedure (1 × 10 6 permutations).