Analysis of DNA variations in GSTA and GSTM gene clusters based on the results of genome-wide data from three Russian populations taken as an example

Background Extensive genome-wide analyses of many human populations, using microarrays containing hundreds of thousands of single-nucleotide polymorphisms, have provided us with abundant information about global genomic diversity. However, these data can also be used to analyze local variability in individual genomic regions. In this study, we analyzed the variability in two genomic regions carrying the genes of the GSTA and GSTM subfamilies, located on different chromosomes. Results Analysis of the polymorphisms in GSTA and GSTM gene clusters showed similarities in their allelic and haplotype diversities. These patterns were similar in three Russian populations and the CEU population of European origin. There were statistically significant differences in all the haploblocks of both the GSTM and GSTA regions when the Russian populations were compared with populations from China and Japan. Most haploblocks also differed between the Russians and Nigerians from Yoruba, but, some of them had similar allelic frequencies. Special attention was paid to SNP rs4986947 from the intron of the GSTA4 gene, which is represented in apes by an A nucleotide. In the Asian and African samples, it was represented only by a G allele, and both allelic variants (G/A) occurred in the Russian and European populations. Conclusions The results obtained suggest the presence of common features in the evolutionary histories of the GSTA and GSTM gene regions, and that African subpopulations were involved differently in the formation of the European and Asian human lineages.


Background
The results of genome-wide analyses of different populations can be used to study the patterns of DNA diversity in particular genomic regions containing specific genes or gene clusters. One interesting and functionally significant genetic system includes the glutathione-Stransferase (GST) genes that encode the different GSTs.
The GSTs are one of the key groups of detoxification enzymes. The chemistry of the reactions catalyzed by these enzymes is based predominantly on the conjugation of glutathione to the electrophilic centers of various substances, which leads to a loss of toxicity and the formation of more hydrophilic products. The important noncatalytic functions of the GSTs include their capacity to sequester carcinogens, their involvement in the intracellular transport of a wide spectrum of hydrophobic ligands, and their modulation of signaling pathways [1,2]. Like most other human genes, the genes encoding the GSTs are polymorphic. It has been suggested that these polymorphisms are functionally significant and that the frequencies of their allelic variants differ among human populations [3]. Until recently, only a limited number of GST polymorphisms had been studied (e.g., GSTM1 and GSTT1 gene deletions, a 3-bp deletion in GSTM3 intron 6, and SNPs in GSTP1 exons 5 and 6), and these were not sufficient to infer the genetic relationships of populations. It may be especially relevant that particular genes of some GST families are located close to each other, forming clusters in the genome [4]. However, with advances in the methods of genome analysis, including high-throughput genotyping technologies, it has become possible to obtain and use more detailed information about the polymorphisms in regions of interest. Recently, Polimanti and co-workers (2011) [5] compared polymorphisms of the soluble GST genes in some reference populations using the HapMap database. In the current study, we examined the polymorphisms in two genomic regions, comprising clusters of GSTA and GSTM genes, located on different chromosomes, in three groups of Russians from the western (Tver), eastern (Murom), and southern (Kursk) regions of the European part of Russia. The analyses were based on both the comparison of allelic variation in individual SNPs and the haplotype diversity across the GST clusters. The genotypes were obtained from a genome-wide analysis of SNPs [6,7], performed with Illumina microarrays. To compare the Russian populations with other populations throughout the world, we also included four populations from the HapMap Project in this study: Utah USA residents with ancestry in northern and Western Europe (СEU), Han Chinese from Beijing (CHB), Japanese from Tokyo (JPT), and the Yoruba people of Ibadan, Nigeria (YRI). Their genotypes were downloaded from the HapMap Project site [8]. The data obtained showed high levels of similarity across the three Russian populations studied and between the Russian and CEU populations. However, the differences between them and the Asian and African populations were significant.

Methods
DNA samples were isolated from blood samples obtained with the informed consent of Russian donors from western (Andreapol district of the Tver region), eastern (Murom district of the Vladimir region), and southern locations (Kursk and Oktyabrsky districts of the Kursk region) in the European part of Russia. Their ethnicity was determined by interview. All individuals were unrelated and represented the native ethnic groups in the regions studied (i.e., they belonged to at least the third generation living in a particular geographic region). The DNA was isolated from peripheral leukocytes with standard techniques, using proteinase K treatment and phenol-chloroform extraction [9].
All the DNA samples were genotyped at the Estonian Biocentre (Tartu, Estonia), using the Illumina Human CNV370-Duo (Tver and Murom samples) and Human 660W-Quad chips (Kursk samples), according to the manufacturer's instructions. In total, 288 Russian samples were genotyped (96 samples per population).
Because the microarrays differed in the numbers of SNPs tested, the number of SNPs examined was standardized to obtain a set of loci that was consistent across all the populations analyzed. The set of loci was chosen by considering the chromosomal regions in which the GSTM and GSTA gene clusters are located. The sample sizes of the populations taken from the HapMap Project were: 165 individuals from CEU, 86 from CHB, 84 from JPT, and 166 from YRI.
The allele frequencies, their Hardy-Weinberg equilibrium status, and the SNP-based Wright's fixation index (F ST ) [10] were calculated using the PowerMarker software package (v.3.0) [11]. The pairwise linkage disequilibrium statistic (D') [12] was estimated and the haplotypes were inferred for adjacent markers using an accelerated expectation-maximization algorithm embedded in the Haploview software [13]. The haplotype block patterns were defined using the block definition based on the linkage disequilibrium measure D' and its confidence interval. Linearized pairwise F ST [14] values were used to evaluate the genetic affinities between populations. The significance level was set at P < 0.05. Figure 1 shows 15 polymorphisms of the GSTA cluster, which is located at p12.1 of chromosome 6 over a 250kbp area. The polymorphisms are presented according to their locations in relation to the genes. Based on the threshold value for the pairwise linkage disequilibrium between the SNPs (D' > 0.7) [15], six blocks were inferred in the GSTA cluster. All the haploblocks were identical in all the populations studied. Figure 1 shows the haploblocks for the Russian population from Tver. The corresponding data for the other two Russian populations were identical to the Tver data. Table 1 shows the allelic frequencies for all the polymorphic loci of the GSTA cluster in the Russian populations and in the HapMap populations. A comparative analysis showed no differences in the distributions of the SNP variants in the Russian populations, and similar allelic frequencies were found in the CEU population. However, the allelic distributions in the three remaining HapMap populations differed considerably from those in the populations of European descent.

Results
We also calculated the fixation indices (F ST ) to quantitatively assess the levels of interpopulation frequency variation. Figure 2 presents the multidimensional scaling of the matrix of linearized pairwise F ST values. The diagram shows that the Russian populations form a single cluster, with the CEU population close to them. However, the African YRI population and Asian CHB and JPT populations are situated at a considerable distance from them. Table 2 shows the frequencies of the haplotypes in each haploblock in all the populations analyzed. It is evident that different haploblocks contain different numbers of haplotypes. For instance, haploblock #1 has only two haplotypes, whereas haploblock #4 has four haplotypes. Some haploblocks, namely blocks #5 and #6 in the CHB and JPT populations and block #6 in the YRI population, were not inferred because the SNPs tested were monomorphic in these populations.
The comparison of the populations was based on the haplotype frequencies calculated for each block. The calculated probabilities (P values) presented in Table 3 Figure 1 SNPs studied in the GSTA gene cluster (e.g., the Tver population). The numbers inside the diamonds show the pairwise linkage disequilibrium (D 0 ) values. show the results of this comparison. The statistically significant levels of P were set for each block using the Bonferroni correction for multiple testing. The data generated showed no marked differences in the haplotype frequencies across the Russian populations and the CEU population. However, a comparison of the haplotype frequencies in the Russian populations with those in the Chinese, Japanese, and Nigerian populations indicated significant difference between them. Most P values were considerably lower than the specified levels. The   , where "n" is the number of haplotypes and "a" is the number of populations [16]. The absence of some haploblocks in the CHB, JPT, and YRI populations did not allow us to compare these populations; the respective columns are marked with dashes on the corresponding lines. Figure 3 SNPs studied in the GSTM gene cluster (e.g., the Tver population).
exceptions were block #5 and, to a certain degree, blocks #1 and #4, where the pairwise P values for the pairs of Russian and Nigerian populations were higher than values specified for these blocks.
The GSTM gene cluster is located on chromosome 1 in the p13.3 region and accounts for 85 kbp. The 14 marker loci found within the cluster (Figure 3) are listed in Table 4. As in the GSTA cluster, similarities in the frequencies of the GSTM alleles were observed between the Russian populations and the CEU population. Different frequencies were observed for the samples from Asia (CHB and JPT) and Africa (YRI). The two-dimensional plot of F ST -based distances was similar to the plot obtained for the GSTA cluster (data for the GSTM cluster are not shown). Table 5 shows the haplotype frequencies in the haploblocks of the GSTM cluster. As in the GSTA cluster, the numbers of haplotypes observed in the blocks differed. When we considered the P values for the pairwise comparisons (Table 6), there were no marked differences in the haplotype frequencies between the Russian populations from Tver, Murom, and Kursk, and the CEU population. However, statistically significant differences were observed in most comparisons of the Russian populations with the CHB, JPT, and YRI populations. The only exceptions were in block #1, where the P values for the  pairwise comparisons of the haplotype frequencies of the Russian and Nigerian populations were much higher than the specified significance level.

Discussion
Extensive genome-wide analyses of many human populations, using microarrays containing hundreds of thousands of SNPs, have provided us with considerable information about global genomic diversity [17]. These data can also be used to analyze the variability in local genomic regions, marking the evolutionary trajectories for both the main human groups and local populations.
In this study, we analyzed the variability of two genomic regions containing the genes of the GSTA and GSTM subfamilies. Our work was based on genotype data obtained from a whole-genome analysis of SNP genotypes performed with Illumina microarrays in three Russian populations. We compared these data with corresponding data from several HapMap populations.
Although genes of GSTA and GSTM subfamilies are located on different chromosomes, our analysis of the polymorphisms in these two gene clusters showed similarities between them in terms of their patterns of allelic and haplotype frequencies across the populations examined. The haplotype spectra of the three Russian populations studied (from Tver, Murom, and Kursk), who share a common ethnic origin, were similar. No marked differences were also established between the three Russian populations and the CEU population, which clearly reflects their common European ancestry. In this context, it was interesting to find some similarity between the Russian samples and the Yoruba population from Nigeria in the haplotype frequencies of some blocks (mainly block #5 of the GSTA cluster and block #1 of the GSTM cluster). Because the European populations differed significantly from the populations of China and Japan in the haplotype spectra of all blocks in both clusters, we propose that these similarities can be attributable to some particular features of these haploblocks in the microevolutionary history of the populations. At the same time, the Russian and Nigerian populations differed significantly in the remaining haploblocks of both gene clusters.
Another interesting finding that warrants particular attention is SNP rs4986947 from block #6 of the GSTA cluster, located in the intron of the GSTA4 gene. In apes, this SNP site carries an A nucleotide [18]. In the populations analyzed from Asia and Africa, another nucleotide (G) occurred at this SNP site with a frequency of 100% (the same is also true for two other African HapMap samples-Luhya and Maasai) [19]. By contrast, in the European populations tested, including all populations from Russia, both alleles (G/A) are represented at this locus; i.e., the ancestral allele, containing A, is also present in these populations. Two possible explanations for this fact can be proposed. The first assumes substantial ancient gene flow (migrations) from Africa to the proto-West Eurasian (European) population after its divergence from the proto-East Eurasians [20]. These migrations could have included individuals with the ancestral A allele at rs4986947, which is virtually absent from the reference African populations. The second explanation is that the mutation could have been reversed in part of the European population, thus returning to its ancestral state. The persistence of the A allele in Europeans may be attributable to natural selection, which can shape the interethnic variation in the GST genes, as has been demonstrated by Polimanti et al. (2011) [5]. In addition to the Russian and CEU samples tested, the A allele at rs4986947 is also found at frequencies of around 6% in geographically distant European samples from Great Britain, Finland, and Italy [19]. These quite low frequencies may be the result of balancing selection.

Conclusion
In summary, we have reported the results of a study of SNPs in two genomic regions carrying the genes of the GSTA and GSTM subfamilies. By using a haplotypebased approach, we have demonstrated a similarity in the patterns of allelic diversity between the GSTA and GSTM gene clusters in all populations studied. This leads us to propose that the evolutionary histories of these clusters share many features and mark the same events in the evolutionary trajectories of the main human groups.