Genetic diversity and population structure of the Tibetan poplar (Populus szechuanica var. tibetica) along an altitude gradient

Background The Tibetan poplar (Populus szechuanica var. tibetica Schneid), which is distributed at altitudes of 2,000-4,500 m above sea level, is an ecologically important species of the Qinghai-Tibet Plateau and adjacent areas. However, the genetic adaptations responsible for its ability to cope with the harsh environment remain unknown. Results In this study, a total of 24 expressed sequence tag microsatellite (EST-SSR) markers were used to evaluate the genetic diversity and population structure of Tibetan poplars along an altitude gradient. The 172 individuals were of genotypes from low-, medium- and high-altitude populations, and 126 alleles were identified. The expected heterozygosity (HE) value ranged from 0.475 to 0.488 with the highest value found in low-altitude populations and the lowest in high-altitude populations. Genetic variation was low among populations, indicating a limited influence of altitude on microsatellite variation. Low genetic differentiation and high levels of gene flow were detected both between and within the populations along the altitude gradient. An analysis of molecular variance (AMOVA) showed that 6.38% of the total molecular variance was attributed to diversity between populations, while 93.62% variance was associated with differences within populations. There was no clear correlation between genetic variation and altitude, and a Mantel test between genetic distance and altitude resulted in a coefficient of association of r = 0.001, indicating virtually no correlation. Conclusion Microsatellite genotyping results showing genetic diversity and low differentiation suggest that extensive gene flow may have counteracted local adaptations imposed by differences in altitude. The genetic analyses carried out in this study provide new insight for conservation and optimization of future arboriculture.


Introduction
Altitude gradients represent one of the most useful natural environments to investigate ecological and evolutionary responses of biota to geophysical influences [1]. For species from habitats which cover different altitudes, differences in their spatial population structure could be due to restricted gene movement, as a result of non-random mating or geographic barriers [2,3]. Outliers of species found at the boundaries of their distribution zones could be subject to limited gene flow, a small population size and founder effects, all of which lead to a decrease in genetic diversity and an increase in population differentiation [4]. For species living in mountainous areas, altitude changes represent a series of physical factors that can result in the establishment of different populations and species. These factors form barriers, which influence genetic diversity and population structure [5][6][7], and include factors such as rainfall [8] and temperature [9]. There is no general rule to summarize the relationship between genetic diversity and altitude; for trees on mountainsides, the pattern of genetic diversity along the altitude gradient is divided into four groups. (1) Populations at an intermediate altitude have greater diversity than populations at lower and higher altitudes, due to local adaptation and milder environmental conditions [10,11]. (2) Populations at higher altitudes have greater diversity than those at lower altitudes if the higher altitude conditions are similar to their home sites, representing higher fitness [12]. (3) Populations at lower altitudes have greater diversity than those at higher altitudes, as higher altitudes impede growth and the expanding of species countering the bottleneck leaded to decrease of genetic diversity [13]. (4) Populations show no differences in diversity at differing altitudes [14], the pattern may be due to that the sampling area was part of main distribution area, limited number of populations sampled along the gradient may cause the failure to detect altituderelated trends. On the other hand, if the sampled population was large enough, extensive gene flow and other factors also could lead to the similar pattern..
The Qinghai-Tibetan Plateau (QTP) is the highest and largest plateau in the world, with a mean altitude of 4 000 m above sea level, and an area of 2.5 × 10 6 km 2 . In recent years, the QTP has become a hotspot for plant phylogeographical studies [15,16], focusing mainly on the population dynamics that took place during the Quaternary (reviewed in Qiu et al.) [17]. However, genetic variation patterns along altitudinal gradients of the QTP remain unclear.
The Tibetan poplar belongs to Populus sect. Tacamahaca in the genus Populus and is an ecologically important species, mainly distributed in Sichuan and Tibet at altitudes from 2 000 to 4 500 m [18]. Recent studies have focused mainly on the phylogenic and physiological mechanisms responsible for its resistance to the harsh environment where the lowest temperature is -30°C and the annual average temperature is between 4°C to 12°C [19]. However, there is a pressing need to understand the genetic diversity along altitude gradients. In this paper we investigated the genetic variation of the Tibetan poplar along an altitude gradient using microsatellite genotyping. The specific objectives were: (1) to understand the genetic variation and differentiation within and between populations, and (2) to detect any influence of altitude gradients on genetic diversity.
In this study, a total of 24 EST-SSR loci based on Populus euphratica transcriptome [20] were used to analyze the genetic diversity and population structure of Tibetan poplar populations at different altitudes in the Sejila mountain area. The objectives were to provide a complete picture of the genetic diversity of Tibetan poplar populations at different altitudes in the Sejila mountain area, and to identify a relationship between genetic variation and differences in altitude.

Sampling strategy and DNA extraction
We collected leaves from 64, 34 and 74 individuals from high-, medium-and low-altitude populations, respectively ( Figure 1, Table 1). Our sampling scheme was to divide the distribution areas of the Tibetan poplar in the Sejila mountains (in southeastern Tibet) into three altitude-gradient groups (high, medium, and low), even though the trees are distributed continuously throughout the area. We selected individuals at a minimum of 30 m apart to prevent selection of clones. The leaf was rapidly dehydrated using silica gel beads. Total genomic DNA was extracted from approximately 0.5 g of silica-dried leaf using a modified version of the cetyltrimethyl ammonium bromide method [21]. The quality and concentration of the extracted DNA were determined by 1% agarose gel electrophoresis and ultraviolet spectrophotometry. The DNA samples were diluted to 5-10 ng/μL for use as the template for polymerase chain reaction (PCR) amplification.

Primer selection
113 EST-SSR primer pairs based on the Populus euphratica transcriptome [20] were developed and tested for suitability in the Tibetan poplar. DNA extracted from four Tibetan poplar individuals was amplified, and the amplicons were sequenced to confirm the existence of and enumerate repeat motifs. DNA from eight individuals was used to test for polymorphisms of the successfully amplified primers. SSRs were selected if they had at least three alleles and exhibited robust amplification.

SSR amplification
After screening, 24 primer pairs were selected for the PCR analysis. The forward primer of each pair was tagged with a section of the universal M13 sequence (5′-TGTAAAACGACGGCCAGT-3′) during synthesis. Each 10-μL PCR mixture contained 1× Taq buffer, 0.2 µM dNTPs, 10-20 ng template DNA, 1.6 pmol reverse primer, 1.6 pmol fluorescently labeled M13 primer, 0.4 pmol forward primer and 1 U Taq polymerase (BioMed). PCR amplification was performed using a Biometra thermocycler (Biometra, Goettingen, Germany) under the following conditions: 94°C for 5 min; 30 cycles of 94°C for 30 s, annealing at 56°C for 45 s and elongation at 72°C for 45 s; 8 cycles of 94°C for 30 s, annealing at 53°C for 45 s, elongation at 72°C for 45 s; and a final extension at 72°C for 10 min. The PCR products were separated by capillary electrophoresis using an ABI 3730xl DNA Analyzer (Applied Biosystems, Foster City, CA, USA) after confirmation of amplification on a 1.5% agarose gel. Approximately 0.5 μL of the PCR products obtained using each of the four fluorescently labeled primers was then combined. The products were separated using an ABI 3730xl DNA Analyzer with GeneScan-500 LIZ as an internal marker (Applied Biosystems). The amplicon fragments were sized using GeneMarker version 1.75 (Soft Genetics LLC, State College, PA, USA).

Data analysis
The FLEXIBIN software was used for automated binning of the raw molecular data [22], and the Excel Microsatellite Toolkit [23] was used to convert the size data into a format suitable for further analysis. Genetic parameters were estimated under the hypothesis that all the loci were neutral, thereby presenting a true picture of the natural genetic structure affected by neutral forces such as genetic drift and gene flow, etc. There are several methods of investigating whether a particular locus has been under selection pressure. We performed the F ST outlier test using LOSITAN [24] to identify candidate SSR loci possibly under selection pressure [25]. After removal of outlier loci, the remaining data were used to estimate the genetic diversity of the population. Genetic diversity parameters used included: number of alleles (Na); observed heterozygosity (H O ); expected heterozygosity (H E ) within a subpopulation; Wright's fixation indices for within-subpopulation (F IS ) and in the total population (F IT ); and pair-wise differentiation among subpopulations (F ST ), according to Weir & Cockerham [26]. F IS measures the deviation from the Hardy-Weinberg equilibrium (HWE) of genotype frequencies in sub-populations, whereas F IT measures the deviation from HWE in the total population. The values of F IT and F IS can be negative, whereas F ST is always a positive value. The Shannon's diversity index was conducted using Nei's model, along with the expected heterozygosity [27]. Gene flow (Nm) was calculated to ascertain the conditions of gene communication among populations, and was estimated as follows: Nm = (1-F ST )/4 F ST [28]. Summary statistics were calculated using POPGENE version 1.32 [29]. Interand intra-population differentiation was determined by AMOVA analysis using the GenAlEx software version 6.41 [30]. Clustering, based on a Bayesian model which assumed that all the individuals were from K real populations (where K may be unknown), each of which is characterized by a set of allele frequencies at each locus, the method attempts to assign individuals to populations on the basis of their genotypes, while simultaneously estimating population allele frequencies. The model was used to evaluate the genetic structures of the Tibetan poplar populations using STRUCTURE in its extended version 2.3.3 [31,32]. STRUCTURE is based on a model-based clustering  algorithm that applies a Bayesian framework and the Markov chain Monte Carlo (MCMC) algorithm. The optimum number of subpopulations (K) was confirmed after 20 independent runs for each value of K between 1 and 10.
The length of the burn-in period and number of MCMC reps after burn-in were set to 25,000 and 100,000, respectively. The K subpopulations identified indicated clusters characterized by a set of allele frequencies at each locus, where individuals were assigned to a subpopulation, or to two or more populations, if the genotype indicated that they are admixed [33]. In this study, the identification of K used the model developed by Evanno et al. [34]. The Bayesian framework was not used to estimate the nonhomogeneous original populations, instead we used ΔK, which was based on the rate of change in the log probability of data between successive Ks. STRUCTURE accurately detected the uppermost hierarchical level structure for the scenarios tested. A Mantel test, performed with GenAlEx version 6.41 [30], was used to calculate the coefficient of association between genetic distance and altitude.

SSR genotyping
SSRs are generally used in genetic diversity studies as evolutionary neutral markers. In this study, 24 SSR primer pairs were developed using the Populus euphratica genome, which were transferable to the Tibetan poplar. Sequencing results were uploaded to GenBank, ( Table 2). In total, 114 alleles for 24 loci were amplified (mean = 4.75, SD = 2.71), with locus 7 having 12 alleles and exhibiting the most variation. Locus 18 was detected using LOSITAN based on its F ST value(0.16) [24], which showed that it was under positive natural selection (p = 0.01) (Figure 2). Subsequently, the sequence was processed using NCBI BLAST [35], and there was high homology with a protein (ID: XM_002311699.1) present in Populus trichocarpa. This implied that this SSR locus could be under selection pressure.

Genetic diversity
Previous assessment of genetic diversity among three populations of Tibetan poplar was based on allelic variation observed at 23 neutral microsatellite loci. In this study, a mean of 3.71 alleles per locus were confirmed for all 23 loci in 172 Tibetan poplar individuals. H O and H E are important parameters for assessing genetic diversity of populations, and they ranged from 0.40-0.42 and 0.48-0.49, respectively. In the three populations, as the results were consistent with the Na, it indicated that genetic variation was not significant, and the populations were similar in all parameters. F IS presented a

Genetic structure
The AMOVA indicated different levels of genetic variance among populations and among individuals within populations. Of the total genetic variance, 6.67% was ascribed to population divergence; the remainder was ascribed to the differences between individuals. However, there was a significant difference among populations (p < 0.001). In populations sampled from high, medium, and low altitudes, all genetic diversity parameters were similar, indicating no local adaptation or population differentiation in the study area.
The SSR data was sorted in order of altitude. Population structure analysis was processed according to the known order of individuals, yielding an optimal of K = 2 [34]. Estimated populations of the 172 individuals are shown ( Figure 3). The samples plot showed that low-and highaltitude individuals were considered to originate from a single group. However, the medium-altitude group was an admixture of the high-and low-altitude groups, and there was no clear separation between the groups (Figure 4). The F ST value showed little differentiation between the populations. Since STRUCTURE could not perform an analysis of K = 1 on populations with no difference, we did not accept the results of K = 2, based on the low F ST and the genetic parameters pattern among the three populations. To compare the structure of new clusters (cluster 1 and cluster 2), further STRUCTURE analyses were performed in cluster 1, which contained individuals from a high altitude, and cluster 2 from low altitude. It shows no clear structure despite peaks in K = 3 for cluster 1 and K = 2 for cluster 2 ( Figure 5). The ancestry values of all of the individuals revealed that each had an equal probability of being grouped in cluster 1, 2 or 3 for the high-and lowaltitude clusters. An analysis based on the Mantel test ( Figure 6) showed that genetic distance was not significantly correlated with altitude (r 2 = 0.001, p ≤ 0.07), suggesting that altitude was not the principal factor influencing genetic differentiation in the Tibetan poplar.

SSR markers and neutrality
In this study, we used GeneMarker version 1.75 to identify the fluorescently labeled PCR products. We selected 24 SSR primer pairs based on the Populus euphratica genome, to analyze genetic diversity within three populations of Tibetan poplar living at different altitudes in Linzhi, Tibet. As the loci were transferable between the two species, this indicates that they may be sited in a conserved region. However, based on the F ST using LOSITAN, one locus appeared to be an outlier. SSR loci mutations occasionally occur as a result of the stress of adapting to a change of environment [36] or an external stimulus [37]. Further, studies have shown that some SSR loci are non-neutral [38,39], and for this reason it is essential that a neutrality test is performed before the SSR data are used in any further analysis. The outlier locus sequence was processed using NCBI BLAST [35], and indicated high homology with a protein in Populus trichocarpa. We conclude that the microsatellite may be linked to expressed genes, and therefore, neutrality should not be assumed, but tested in all of the markers before genetic diversity and structure analysis. This type of marker, however, could be useful for phylogenetic studies of closely related species [40,41].

Genetic diversity
As expected from perennial and woody species ranging across most areas of the Qinghai-Tibet plateau, the study population contained a high level of genetic diversity, but we did not identify any significant differences among the three populations from different altitudes. The number of alleles per locus in our study was less than in other related Populus species [42]. A mean of 6.1 alleles per locus was identified from the existing literature on Populus genetic diversity [42]. The Na of 3.73 in our study is lower than the Na in P. tremuloides (4.9) as described previously [43]. The difference is most likely due to the limited sampling area. We only collected samples from one mountain area, whereas the Tibetan poplar is distributed throughout southwestern China, of which our samples were from a limited proportion, as we aimed to study adaptation and genetic diversity along an altitude gradient. The samples from high, medium, and low altitudes appeared to be similar  in genetic diversity and showed no evidence of local adaptation in the study area. STRUCTURE analysis showed that the population could be divided into two groups (clusters), with individuals from the lower altitude clustered into group 1, and those from the higher altitude clustered into group 2. Altitude appeared to have a direct relationship with the distribution of the groups, but the F ST value showed little differentiation between the populations. As STRUCTURE could not provide data for K = 1, we rejected the result showing that the population was divided into two groups. There was no peak in the estimate of the log-likelihood of the Figure 3 Identification of K. the method of delta K was used to identify the accurate sub-clusters in the population. In this population there is a peak of delta K in K = 2, the population is possibly composed of two sub-clusters. cluster number (L(k)) since the lowest likelihood was for K = 1, and L(k) either consistently increased or showed an erratic pattern with increasing variance, with all individuals admixed and the proportion of any individual assigned to each subpopulation remaining roughly similar. The Evanno criterion, ΔK [34], was not relevant as it can only be computed for K ≥ 2 and does not enable comparison of results from K = 1. For K > 2, the value of ΔK remained close to 0 in this study.
The population structure and Mantel test results suggest that the relationship between genetic diversity and altitude is not significant, and hence it is possible to hypothesize that the species has not had sufficient time for evolutionary differentiation to occur along an altitude gradient.
Low F ST and strong gene flow F ST was low for all loci, except for SSR 18. There was no noticeable differentiation among populations at three  different altitudes. This may contribute to the local geographic structure and strong gene flow among individuals. The STRUCTURE results showed that the medium-altitude group was an admixture of the low-and high-altitude groups, clearly indicating that the mountain harbored two groups (clusters) of poplars, and that they separated into these clusters at an altitude of 2700 m. Because the study area altitude ranged from 2 000 to 4 000 m, and the Tibetan poplar is distributed from 2000 to 3096 m, the tree line represented a limiting factor for tree distribution, but it appeared to have had limited impact on gene exchange between individuals and did not hinder pollen or seed dispersal. In this study, gene flow occurred among the populations. Gene flow is a vital element in local adaptation studies, because it can instruct the establishment of the local genetic structure or influence it indirectly. Gene flow among populations can also lead to combining of gene pools, reducing genetic variation among groups [44]. Therefore, gene flow acts strongly against speciation in evolutionary processes [45], by recombining the gene pools of the groups. Gene flow plays a part in evolution through pollen dispersal, seed dispersal, and the establishment of the individual adult. A geographic barrier increases the probability of extinction or local adaptation of a population, as it may push the population to evolve into a different population with a unique genetic structure, or even into a new species [46,47]. However, gene flow could also be a constraining force of natural evolution by homogenizing populations under a heterogeneous environment, and balancing gene distribution and spread [48]. However, gene flow can also be considered a creative force in evolution, where superior genes or combinations of genes are spread by gene flow [49,50]. For local adaptation, gene flow and selection are usually considered as the main forces affecting the processes of establishment. This is especially true for high outcrossing trees and perennial species, where there is extensive gene flow [51]. In summary, the factors contributing to the low level of differentiation among populations at different altitudes include: (1) Pollen dispersal and an overlapping flowering period of all three populations (high, medium and low altitude). Generally the flowering phase of Populus is of long duration; for example, flowering in P. × canadensis and P. nigra [52] lasts for 15 and 31 days, respectively. (2) Seed dispersal mechanisms. Most Populus trees live adjacent to rivers and roads, and some in the river channel itself. Therefore, rivers cannot be ignored as an important factor in seed dispersal. Poplar populations are evolutionarily homogeneous. The germplasm and genetic diversity of the Tibetan poplar could be protected by random selection in the future work, which couldprovid all of the genetic diversity to date. An unpublished experiment comparing poplars at two sites showed some differences in the growth rate, leaf characteristics, and branch numbers, etc. of individual clones sampled at different altitudes, indicating that natural selection conserved some fitness types. Genes linked to adaptation mechanisms could contribute to phenotypic variation without genetic structure differentiation which has been proved in this study. Consequently, this makes the population ideal for identifying functional genes and mechanisms of adaptation to high altitudes.

Conclusion
To our knowledge, this is the first genetic analysis of the Tibetan poplar. The results indicate that the Tibetan poplar populations living at different altitudes on the Sejila mountain have a low level of differentiation. They have an excellent ability to adapt to different altitudes; however, local adaptation is not observed due to the lack of a geographic barrier. The high levels of gene flow lead to a low F ST , as was observed. We consider the Sejila mountain population to be appropriate for investigation of the mechanisms of adaptation to high altitudes, despite the low level of genetic structure differentiation among populations at different altitudes.