Genetic diversities of cytochrome B in Xinjiang Uyghur unveiled its origin and migration history

Background Uyghurs are one of the many populations of Central Eurasia that is considered to be genetically related to Eastern and Western Eurasian populations. However, there are some different opinions on the relative importance of the degree of Eastern and Western Eurasian genetic influence. In addition, the genetic diversity of the Uyghur in different geographic locations has not been clearly studied. Results In this study, we are the first to report on the DNA polymorphism of cytochrome B in the Uyghur population located in Xinjiang in northwest China. We observed a total of 102 mutant sites in the 240 samples that were studied. The average number of mutated nucleotides in the samples was 5.126. A total of 93 different haplotypes were observed. The gene diversity and discrimination power were 0.9480 and 0.9440, respectively. There were founder and bottleneck haplotypes observed in Xinjiang Uyghurs. Xinjiang Uyghurs are more genetically related to Chinese population in genetics than to Caucasians. Moreover, there was genetic diversity between Uyghurs from the southern and northern regions. There was significance in genetic distance between the southern Xinjiang Uyghurs and Chinese population, but not between the northern Xinjiang Uyghurs and Chinese. The European vs. East Asian contribution to the ten regional Uyghur groups varies among the groups and the European contribution to the Uyghur increases from north to south geographically. Conclusion This study is the first report on DNA polymorphisms of cytochrome B in the Uyghur population. The study also further confirms that there are significant genetic differences among the Uyghurs in different geographical locations.


Background
Uyghur is one of the populations living in central Asian where people have undergone unceasing migration and interacted with other populations in prehistoric and historic times. As a consequence, Uyghurs share anthropometric and genetic traits with Eastern and Western Eurasian. They are genetically considered to be an admixture of Eastern and Western Eurasian populations, demonstrated by archaeological, anthropologic and genetic studies [1][2][3][4][5][6]. The Uyghur population has been used as a good human population sample for studying the human migration and gene inflow/drafting [3,7]. However, the origin of the Uyghur is under discussion. Modern Uyghurs live primarily in the Xinjiang Uyghur Autonomous Region in China. The majority of the Xinjiang Uyghur live in the south of Xinjiang (south of Tian Shan), with minor populations spreading to the north. Due to environmental, war and political reasons, Uyghurs have undergone unceasing migration (from north to southwest) and interacted with other populations in prehistoric and historic times. This might result in genetically differential characteristics of modern Uyghurs living in different regions. However, most genetic studies of the Uyghur have focused on genetic difference between them and other ethnic groups, and utilized relatively small numbers of sample from very limited geographic locations. The differences in genetic characters between different regional Uyghur populations remain largely unclear. Furthermore, previous studies mainly investigated and compared the genetic diversity of mtDNA control region sequences, while DNA polymorphism of mitochondrial Cytochrome B (CYTB) in the Uyghur population in Xinjiang has not been reported.
CYTB is located in the coding region of mtDNA (positions 14576-16047), spanning the 1140 bp fragment. CYTB has undergone changes during evolution resulting in the occurrence of multiple single nucleotide polymorphic sites [8,9]. Compared with the control region of mtDNA, CYTB is relatively conservative and has fewer mutations [8,10]. CYTB is widely used in taxonomic research to determine phylogenetic relationships between organisms due to its sequence variability [11][12][13]. It is also considered to be most useful in determining relationships within families and genera. CYTB sequence information drawn from populations reflects their matrilineal ethnohistory and can deduce DNA sample's population based on populationspecific nucleotide mutations existing in the CYTB gene [9,14]. Therefore, we hypothesized that the genetic diversity of Uyghur living in different regions of Xinjiang can be identified by the analysis of CYTB sequence.
In this study, we for the first time report on the gene diversity of CYTB in the Uyghur of Xinjiang by completely sequencing their CYTB gene. We demonstrate that there was a genetic variance of CYTB occurring in the different regional Uyghurs and CYTB is a very useful genetic marker for the study of genetic differentiation of Uyghurs in Xinjiang. In particular, we showed the founder/ bottleneck event in the Xinjiang Uyghur.

Results
High gene diversity of cytochrome B in the Uyghur population 1190 bp PCR fragments were amplified in all studied samples and 1140 bp of DNA fragment encoding CYTB (positions 14747-15886) were sequenced. Compared with the standard reference sequence (rCRS), a total of 102 mutant sites were observed in the studied 240 samples (Additional file 1: Table S1). The number of mutation sites in the samples varied from 0 to 10 with an average of 5.126. There were no deletions and/or insertions observed. Table 1 lists the most common and group-unique mutation positions. Position 15326 A -G transition occurred in almost all samples except 2 samples (one from Sanji and another one from Bortala). The second highest mutation position was nt 14766 C-T with an average frequency of 82.92% observed in 240 studied Uyghur individuals. More than 90% of Uyghur from Korla, Kumul, Sanji and Turpan had a 14766 C-T mutation, while this mutation occurred less than 80% in the individuals from Aksu (77.14%), Gulja (73. 68%), Hotan (78.95%), and Kashgar (79.10%). The third dominant mutation positions were 14783 T-C (36.25%), 15043 G-A (36.25%) and 15301 G-A (35.42%). Position 14783 T/C, 15043G/A, and 15301 G/A mutations simultaneously took place with an average frequency of 34.17% in the overall Uyghurs. The frequency of the triple 14783 T/C-15043G/ A-15301G/A mutant varied from population to population, from region to region ( Figure 1). In general, the frequency of the triple 14783 T/C-15043G/A-15301G/A mutant among the ten regional subpopulations decreased in the order of Kumul, Sanji, Turpan, Bortala, Atush, Kashgar, Aksu and Hotan, in line with their geographic location from north to south. The lowest frequency of this triple mutant (15.79%) was observed in the group from Hotan which is located in the south end of Xinjiang, adjacent to Tibet, India and Pakistan. The north end regions, geographically closer to Mongolia, had similar frequencies of the triple 14783 T/C-15043G/ A-15301G/A mutant to the Mongolian people(our unpublished data). Other populations' data are cited from [14].
Studying mutations among the regions, we observed the presence of regional group specific mutations in the CYTB sequence ( Table 1). The mutations 14927 A/G and 15440 T/C were only observed in the Bortala with a frequency of 15.38%, 15746 A/G only in Kumul (30.00%), and 14857 T/C only in Turpan (11.11%). Only three samples of the total 240 Uyghurs showed a nucleotide change at position 14798 which occurs more frequently in Caucasians with a frequency of 0.2083, but not in other Asian groups [14].
The genetic diversities in the regional population were calculated ( Table 2). The gene diversities varied among groups from 0.8800 in Korla to 0.97449 in Bortala, while nucleotide diversities varied from 0.00269 (in Sanji) to 0.007863 (in Korla), respectively. The discrimination power (DP) was higher than 0.8200, even up to 0.9454 (in Kashgar). The overall genetic diversity and power of discrimination were 0.9480 and 0.9440, respectively, suggesting that MYTC B is one of the highly polymorphic markers useful for maternal lineage identification.

Founder / bottleneck haplotypes of CYTB in the Uyghur
There were 93 different haplotypes observed in the 240 Uyghur samples (Additional file 2: Table S2). Due to absence of a standard for assignment of haplotypes, we called each different sequence as an individual haplotype generated by DnaSP5.0. The frequencies of different haplotypes varied between different regions (Additional file 2: Figure 1 The frequency of triple 14783 T/C-15043G/A-15301G/A mutant. DNA was isolated from 240 samples collected from ten different locations of Xinjiang and sequenced. DNA sequences were compared with the standard reference sequence using MEGA5.0, as described in the methods. The frequency of nucleotide mutation was calculated.

Different regional Uyghur represent different genetic features
In order to investigate the genetic diversity of CYTB in ten regional Xinjiang Uyghurs, MDS analysis was first conducted within ten regional Xinjiang Uyghurs and groups were made based on their geography. As shown in Figure 3A, ten regions were separated from each other by dimension 1 and some subpopulations were gathered closer than others. Kumul, Turpan, Sanji, Atush and Korla were located closely at the left side in the MDS plot forming cluster 1, while Kashgar, Aksu, Hotan and Gulja sit on the right side and formed cluster 2. Bortala was almost in the middle in the MDS plot. However, in dimension 2, Bortala was closely distributed to Kumul (a northern Xinjiang place), but still be separated from cluster 2 consisting of most of southern Xinjiang regions.
Tianshan and the ancient Silk road are a geographic dividing line separating Xinjiang into two parts: Southern Xinjiang (Nanjiang) and Northern Xinjiang (Beijiang). Nanjiang consists of Kashgar, Aksu, Hotan, Atush and Korla, while Kumul, Turpan, Sanji, Gulja and Bortala belong to Beijiang. Accordingly we divided ten regional Xinjiang Uyghurs into two subgroups: Southern Uyghur (groups living in Nanjiang) and Northern Uyghur groups (living in Beijiang). We then performed MDS analysis with the Southern and Northern Uyghur and other populations [14][15][16][17][18] in order to reveal the genetic relationships among these populations. Figure 3B shows that both Northern and Southern Uyghurs and other 4 Asian populations (Thai, Vietnamese, Filipino and Chinese) were distributed closely to the left in MDS plot and formed a cluster, while Caucasians were isolated far away from them and the cluster, suggesting that both Southern and Northern Uyghur are closer to Asian in genetics than to Caucasian. In addition, the Northern Uyghur was separated from the Southern Uyghur. In comparison with the Southern Uyghur, the Northern Uyghur were more closely located to the Chinese.
Furthermore, we noticed that Gulja Uyghurs who geographically reside in northern Xinjiang appeared closely related to the southern subgroups (Hotan, Kashgar and Aksu) in the MDS plot, while geographic southern Uyghur subgroups Korla and Atush were close to the northern subgroups (Kumul, Sanji and Turpan) as shown in Figure 3A. Therefore we regrouped Uyghur samples by subtracting Gulja from northern group, Korla and Atush from the southern group, and conducted the third MDS analysis using new designed groups ( Figure 3C). This regrouped MDS plot displayed that the northern group (Turpan, Kumul, Sanji and Bortala) was separated further away from the southern group (Hotan, Aksu and Kashgar). Korla was distributed between Chinese and the new northern group with similarly close distance to both of them. The Gulja and the new southern group formed a cluster, clearly separated from the northern group and Caucasian. In contrast, Atush was distributed relatively more closely to northern Uyghur.

The patterns of haplotypes of CYTB reflected migration history and origin of Uyghur
To further explore the genetic linkage, the genetic distances between each of the regional groups were determined (Table 3). Genetic distances between regions shown in Table 3 nicely reflected their geographic distance. There were significant differences in genetic distance between south region Hotan and north regions Kumul and Turpan. Interestingly, there were significant differences between the south region Korla and Hotan and between Korla and Kashgar. An unrooted NJ tree was constructed based on the ten regional groups' F ST distances as seen in Table 3 ( Figure 4A). The tree showed that there were three branches: Hotan and Gulja formed one branch; Turpan, Sanji, Kumul and Korla formed the second branch, Aksu, Kashgar, Bortala and Atush formed the third Branch. The tree also showed that there were genetic diversities within branches with the exception of the first cluster.
With the same strategies in MDS analysis, we regrouped samples and calculated genetic distance (Tables 4 and 5) and constructed phylogenetic trees ( Figure 4B and 4C) using new assigned groups. Table 5 showed the results of genetic distance generated by regrouping the Uyghur into two major groups--Southern and Northern Uyghur. Figure 4B and 4C are NJ trees constructed with the genetic distances from Tables 4 and 5. We found that both the southern and northern groups showed shorter genetic distance from Chinese than from Caucasian. There were significance in genetic distance between the southern Uyghur and Caucasian, between the southern Uyghur and Chinese, and between the northern Uyghur and Caucasian, while there was no significance between the northern Uyghur and Chinese. The NJ tree ( Figure 4B) intuitively showed the genetic relationship between these populations, supporting Table 4 results. Table 5 further confirmed the genetic relationship between the southern, northern, Chinese and Caucasian. More importantly, Table 5 and Figure 4C demonstrated that geographic southern cities Atush and Korla showed gene similarity to the northern Uyghur, although northern city Gulja is genetically similar to the Southern Uyghur, but not to the northern Uyghur. The Korla Uyghur was completely different to the southern Uyghur (Hotan, Aksu and Kashgar).

Discussion
In this study we first reported the genetic polymorphisms of cytochrome B in Uyghur population of Xinjiang. We demonstrated that cytochrome B is a very useful and informative marker for matrilineal identification and population studies, second to the mtDNA control region. There are population specific and regional population specific nucleotide positions in the cytochrome B sequence, which can help identify DNA sample's region and population   Table 4 (B) and Table 5 (C) an unrooted Neighbor-joint trees were constructed using Phylips 3.6.9 software. Other populations were cited from [14]. information. The results of the present study conclude that: 1) there are founder/ bottle neck events in Xinjiang Uyghur; 2) the Uyghur is closer to Chinese, rather than to Caucasian in genetic distance; 3) There are genetic difference between the Southern and Northern Uyghur; 4) Gene influence from Asian (Chinese) is stronger on the Northern Uyghur, than the Southern Uyghur; and 5) CYTB is a good genetic marker for differentiation of subgroups. Xinjiang is located in Central Asia which is an intermediate region of the Eurasian continent. There were 36 nations existing in Xinjiang during prehistorical and historical times. Some nations were defeated by others and destroyed, overrun or forced to migrate to other places. Unceasing wars propelled endless migration and coalition, leading to genetic mixture and gene drafting. Additionally, there were two silk roads extending from east to west through Xinjiang, resulting in extensive exchange of culture and marriage between different populations. Our data confirm that Uyghur is an admixture population with contributions from both Eastern and Western Eurasian ancestries, which is consistent with the results from mtDNA control region sequences [3][4][5][6]. Based on the mutation at position 14798, the CYTB sequences of the Uyghur in Xinjiang contain more inflow of East Asian than European. The proportion of European sequences varies from different geographic regions of the Uyghur (data not shown).
The Uyghurs are a Turkic ethnic group living in Eastern and Central Asia. The ancestors of the Uyghur tribe were Turkic pastoralists called Tiele in Northern China, Mongolia, and the Altay Mountains. Due to wars and environmental stress, Uyghur continued migrating from north to south. From the historic perspective, Uyghur originated from Mongolians, rather than Caucasians and inherited Mongolia genetics. The Uyghur population was gradually diluted as they migrated from north to south. Our data on the frequency of triple 14783 T/C-15043G/A-15301G/A mutant and founder/bottle neck haplotypes demonstrated that the Uyghur originated from Mongolia, migrated from north to south, supporting historic reports.
In this study, we found that nucleotide positions with most frequent mutation were 14766, 14783, 15043, and 15302, which were most commonly seen in Asians including the Chinese Han population [14]. Mutation at position 14766 occurs in more than 93% of Asian samples, but less than 0.72% of Caucasians [8,14]. The frequencies of mutation at the position 14766 in the ten different geographic  and 15302 showed the same trend among the ten Uyghur groups as position 14766. Mutations at the above three positions were seldom observed in the Caucasian population. In contrast, mutation at position 14798 that is Caucasian specific was observed in only one Aksu sample and two Hotan samples. Taken together, we found that at the cytochrome B gene, Uyghurs generally have more imprint of East Asian genetic portion than that of Caucasian. These genetic results are in agreement with the history of Uyghur formation. The Uyghurs living in the Nanjiang are different to Beijiang, and the southern Uyghur have relatively less influence from East Asians than the northern. Our data showed that in the same ethnic group, people residing at different places have genetic differences.
In this study, we reported that Uyghur from different geographic locations in Xinjiang have differences in the percentage of European/East Asian ancestry component, distinctly by the difference between Aksu/Hotan and Kumul or Korla. There is a significant difference between the south (Hotan, Kashgar, and Aksu ) and the north of Xinjiang (Kumul, Bortala, Turpan and Sanji). East Asian ancestry dominates the Uyghur from the north, while European ancestry imprints more on the southern Uyghur. In addition, we observed that Uyghurs living in two geographically south cities Atush and Korla shared northern Uyghur genetics (Kumul, Turpan and Sanji), in contrast with the southern Uyghur, while Uyghurs from Gulja which is located in the north of Tianshan in geography presented the southern regional Uyghur genetic features. Demographic records show that the dominant ethnic group is Kyrgyz who defeated the Uyghurs in AD 840, despite the fact that Atush is located in the south of Tianshan [Wikipedia, the free encyclopedia, History of the Kyrgyzstan]. Moreover, there were intermarriages in the Uyghur living in the Atush region. This might explain why the gene pattern of cytochrome B in the Atush Uyghurs was between that of the southern and northern Uyghur group, different from the other three southern group samples. As for Gulja, there were two large migrations of the southern Uyghur from south to north Gulja in history. Over time, the migrated southern Uyghurs expanded in Gulja and became the largest minority in Gulja [Wikipedia, the free encyclopedia, History of the Uyghur people.]. As a result, the Gulja Uyghur was and still is a branch of southern Uyghur, having little relationship with the northern Uyghur, which is in line with our data that the Uyghur from Gulja shares similar genetic characters with the Southern Uyghur. Although Korla is located in the south of Tianshan belonging to Nanjiang ( southern of Xinjiang), it is also adjacent to Turpan and Sanji. Korla is located in Bayingolin Mongol Autonomous Prefecture where the current major population is Chinese Han. As early as 94 CE, the Chinese government and military started to administrate this area and forced different populations to exchange and mix. The results from CYTB genotyping are consistent with that from the control region of mtDNA (unpublished data). Our data showed that the Uyghurs from Korla in genetics are different to the south Xinjiang Uyghurs.

Conclusions
Cytochrome B is a very useful DNA marker with high discriminate power for matrilineal identification, as well as deduction of the region and population of people. The polymorphisms of CYTB were significantly different between different geographical Uyghur (between south and north). The influences of East Asian and European genetics in Uyghur varies between different geographic locations (particularly south and north) of Uyghur.

Sampling
A total 240 healthy unrelated samples were collected from ten different Uyghur concentrated locations/communities    (10), Sanji (10), and Turpan (18). We collected more samples from Kashgar due to more than 36% of Xinjiang Uyghurs live in that area. For the samples it was confirmed that there were no intermarriages between Uyghur and other ethnic groups and no migration history in the latest three generations. The study was approved by the Ethnic Study Committee of Xinjiang University. All blood samples were obtained with informed consent. DNA was extracted from blood using an EasyPure blood genomic DNA extraction kit (TransGen Biotech, Beijing, China).

PCR and sequencing
The primers used for PCR amplification of the CYTB gene listed in Table 6 were adopted from previous studies [14] and synthesized by Shenggong (Shenggong Biotech, Shanghai, China PCR products were visualized on a 1.5% agarose gel and purified using a Biomiga kit (Biomiga, Beijing, China). Sequencing reactions were conducted in a PE-9600 thermocycler (ABI Applied Biosystems, Fostor City, CA, USA), using a BigDye terminator v3.1 Cycle sequencing kit (Applied Biosystems) with the following conditions: 25 cycles of 95°C for 30 s; 50°C for 30 s; and 60°C for 4 min. The primers L14724, H15149, L15283 and H15363 were used for sequencing listed in Table 6. The DNA sequences were detected with an ABI3730 DNA sequencer (Applied Biosystems) in HuaDa Genome Centre, Beijing.