Microsatellites reveal a strong subdivision of genetic structure in Chinese populations of the mite Tetranychus urticae Koch (Acari: Tetranychidae)

Background Two colour forms of the two-spotted spider mite (Tetranychus urticae Koch) coexist in China: a red (carmine) form, which is considered to be native and a green form which is considered to be invasive. The population genetic diversity and population genetic structure of this organism were unclear in China, and there is a controversy over whether they constitute distinct species. To address these issues, we genotyped a total of 1,055 individuals from 18 red populations and 7 green populations in China using eight microsatellite loci. Results We identified 109 alleles. We found a highly significant genetic differentiation among the 25 populations (global FST = 0.506, global FST {ENA} = 0.473) and a low genetic diversity in each population. In addition, genetic diversity of the red form mites was found to be higher than the green form. Pearson correlations between statistics of variation (AR and HE) and geographic coordinates (latitude and longitude) showed that the genetic diversity of the red form was correlated with latitude. Using Bayesian clustering, we divided the Chinese mite populations into five clades which were well congruent with their geographic distributions. Conclusions Spider mites possess low levels of genetic diversity, limit gene flow between populations and significant and IBD (isolation by distance) effect. These factors in turn contribute to the strong subdivision of genetic structure. In addition, population genetic structure results don't support the separation of the two forms of spider mite into two species. The morphological differences between the two forms of mites may be a result of epigenetic effects.


Background
The phytophagous two-spotted spider mite Tetranychus urticae Koch is a serious pest of various agricultural plants including fruit trees, vegetables, ornamentals and agricultural crops. Owing to its rapid development and high reproductive capacities, and ability to feed on more than 900 plants [1], T. urticae has spread worldwide. Despite the importance of this pest, the population genetic structure of this mite has been unclear in China. A better understanding of population genetic structure could help to manage mite populations by providing more reliable estimates of population dynamics and the risk of spreading acaricide resistance genes. The population genetic structure of an organism is determined by various factors, such as geographical barriers, ecological difference, and historical processes, as well as the dispersal ability of the species. T. urticae has multiple dispersal mechanisms. Spider mites are wingless and usually rely on crawling for their dispersal [2]. But they can also be carried for long distance by the wind and by human activities [3,4]. Due to the complex dispersal mechanisms of T. urticae, the population structure and diversity would be complex. Two studies have used molecular markers to investigate the genetic structure of T. urticae in China [5,6]. However, due to the limited number of markers used to analyse geographic populations and individuals, the information that they provided was insufficient to understand the genetic diversity and population structure clearly. To obtain further insights into the population genetic structure of T. urticae, a more systematic and wide scope study based on more microsatellite markers using more powerful Statistical analysis methods are needed.
In addition to questions about the population structure and genetic diversity of T. urticae in China, there is a big question about its taxonomy. T. urticae has two forms: a green form that occurs in temperate and cold areas, and a red (carmine) non-diapausing form (also called T. cinnabarinus (Boisduval)) that occurs in warmer areas [7]. Although distinguishable by body colour and the shape of the male aedeagus, the two forms are polymorphic and have significant intra-specific variation among populations by virtue of their different host plants and different habitats. Furthermore, there is a partial to complete reproductive incompatibility between them [8,9]. Therefore, whether they constitute one or two species has been debated for over 20 years. In China, the two forms coexist. However, the red form is considered to be native, while the green form, which was first reported in 1983 in Beijing, is considered to be invasive [10]. The carmine form is distributed throughout China. The green form has recently expanded its distribution from its putative source area in Beijing to many parts of the country, including Hebei, Shanghai, Liaoning, Jilin, Gansu, Ningxia, Henan, Anhui and Jiangsu provinces and elsewhere [10,11]. Therefore, information with respect to the genetic relationship of the two forms of T. urticae would help to clarify the longstanding issue of their species status.
In this study, two new microsatellite markers developed by ourselves together with six previously defined loci [12,13] were used to gain further insights into the genetic structure of T. urticae in China. Our main goals were to unveil the genetic structure across the known range of T. urticae in China, and clarify the taxonomic status of red and green spider mites from a population genetic structure perspective.

Microsatellite development
Of the 180 clones sequenced, 127 were discarded because the repetition pattern was too short. Of the 53 remaining microsatellites, 44 appeared to be either monomorphic or to have an unclear banding pattern in gels, and seven would not amplify in each of the 40 adult females. Only two microsatellites isolated here (clones TECI104 and TECI08, GenBank accession numbers GU068508 and GU068509, respectively) revealed polymorphism and were used.

Genetic variation
The allelic richness based on a minimum population size of 34 diploid individuals ranged from 2.14 to 6.23 with an average value of 3.89 for every population (Table 1). In general, the observed and expected heterozygosity values were very low, ranging from 0.111 to 0.454 and 0.181-0.754 respectively (Table 1). Tests for linkage disequilibrium between all pairs of loci across populations using FSTAT found no significant genotypic disequilibrium in the pooled data (P > 0.05 for all after Bonferroni corrections). MICRO-CHECKER software identified the presence of null alleles. The frequencies of null alleles per locus per population ranged from 0 to 0.434 (only four cases > 0.4). In 179 of 200 locus-population combinations, the frequencies of null alleles were lower than 0.200. There was a strong overall heterozygote deficit, with f = 0.272 (95% confidence interval, 0.115-0.391). Nineteen of 25 populations showed a significant deficit of heterozygotes (Table 1). Most populations and most loci did not meet the criteria for Hardy-Weinberg equilibrium. Only two green-form spider mite populations, 5 (G) and 4 (G), were in Hardy-Weinberg equilibrium. After correcting the data set for null alleles using the EM algorithm, both the observed and expected heterozygosity values were higher than the raw data ranging from 0.239 to 0.815 and 0.223-0.789 respectively (Table 1). No significant heterozygote deficiency was detected (f = -0.049, 95% confidence interval, -0.058-0.006). Of 200 locus-population combinations, 192 were in Hardy-Weinberg equilibrium. Only three populations, 20 (R), 1 (G) and 4 (G), were not in Hardy-Weinberg due to heterozygote excess at 2-3 loci.
A Pearson correlation analysis showed that R H E and C H E of the red form mite were both significantly and negatively correlated with latitude (for R H E : R = -0.476, P < 0.05, Figure 1A; for C H E : R = -0.469, P < 0.05, Additional file 1). AR tended to be higher at lower latitudes, but the correlation was not significant (R = -0.368, P = 0.067; Figure 1B). Further stepwise regression analysis of AR and H E showed that H E significantly contributed to the stepwise regression equation (for R H E : R 2 = 0.227, F = 4.697, P < 0.001; for C H E : R 2 = 0.220, F = 4.503, P < 0.05) but that AR did not contribute significantly (R 2 = 0.135, F = 2.505, P = 0.133). There was no evidence that either AR or H E of the red form mite was related to longitude. However, for the green form mite, neither latitude nor longitude was correlated with AR or H E . This suggests that the intra-population genetic diversity of the red form mites was negatively correlated with latitude. However, the intra-population genetic diversity of the green form mites was clearly not correlated with latitude.

Population genetic structure
The overall F ST value (F ST = 0.506; 95% confidence interval 0.447-0.564) indicated a high level of population differentiation. Pairwise estimates of F ST calculated between pairs of populations showed that most tests for population differentiation were significant ( Table 2). The only two populations that were not differentiated      However, the IBD of the populations of the green mite was not significant (7 populations, R 2 = 0.001, P = 0.42800; Figure 2B). Similar results for F ST {ENA} confirmed that an IBD effect existed in the carmine mite populations (R 2 = 0.1143, P = 0.018; Additional file 3) and didn't exist in the green mite populations (R 2 = 0.0396, P = 0.416; Additional file 3).
An NJ tree based on DCE genetic distance ( Figure 3) for the raw data showed a very high genetic divergence among the 25 T. urticae populations, and thus no clear clusters. When the date was corrected by the INA method, the NJ tree topology showed subtle differentiation. However, the DCE genetic distance was smaller when calculated with the corrected data than when calculated with the raw data. Moreover, the topology of the tree based on the corrected data (Additional file 4) agrees better with the following Bayesian clustering result than the tree based on the raw data. (The tree branched are coloured according to the Bayesian inferred clusters' colour.) A further STRUCTURE analysis of the full data set (8 loci) revealed the same major patterns that were revealed by the analysis of the five loci with low frequencies of the null alleles. The optimal number of clusters chosen with Evanno's ΔK method was five ( Figure 4). The pattern of the five clusters corresponds well with the geographical distribution of the populations and the colour form of the mite ( Figure 5, Additional file 5). Among the red populations, populations 1 (R) and 2 (R), which are located in the most northern region of China, are grouped in the first cluster and populations 3 (R), 6 (R)-11 (R) and 13 (R) are grouped in the second cluster. All of the populations are north of the Yangtze River except for 13 (R). Populations 16 (R)-20 (R) are grouped in the third cluster. Four out of five populations are located south of the Yangtze River and west of the Gan River. The one exception is 19 (R), which is located north of the Yangtze River. The fourth cluster includes the remaining red mite populations, which are located south of the Yangtze River and east of the Gan River. The populations of green mites were grouped in the fifth cluster. Most populations have a clear allocation to one of the five clusters (i.e., more than 90% of their genome was drawn from one cluster). Figure 5A displays the proportion of each population that contributed to each of the five clusters. Individual assignments are presented in Figure 5B. AMOVA analysis revealed that the genetic divergence among the five clusters was highly significant (F CT = 0.31923, P < 0.0001; Table 3). This validates the result obtained by Bayesian clustering and means that geographic distance is one of the factors responsible for the genetic structure. Genetic divergence between the red (carmine) and green mites was also highly significant (F CT = 0.20837, P < 0.0001), indicating that the gene flow between the two colour forms in China was limited. The genetic divergence of the mites among the different host plants (F CT = 0.08941, P = 0.00587) was lower than the genetic divergence between the red and green mites, but still significant. Similar results were obtained for the corrected genotype data (Additional file 6).

Discussion
Twenty-three of the 25 populations used in this study deviated from HWE, and of these, 19 populations displayed significant heterozygote deficiency. The deficit in heterozygotes may be a universal phenomenon in T. urticae. Navajas et al. [12] found heterozygote deficiency within greenhouse populations of T. urticae populations based on microsatellite markers. Similar observations were made of T. turkestani Ugarov & Nikolski [14] and T. urticae along a latitudinal gradient in Europe [15]. However, contrary results were obtained for the corrected genotype data set. After correcting the data set for null alleles, no significant heterozygote deficiency could be detected. This phenomenon suggests that the presence of null alleles is one of the main causes of the deviation from HWE due to heterozygote deficiency. The frequency of null alleles in microsatellite loci seems to be greater in T. urticae than in insects [16], probably because of a high rate of mutation in the region flanking the microsatellites [17]. In previous studies of the genetic structure for T. urticae based on microsatellites, the presence of null alleles, inbreeding due to patchy distributions, arrhenotokous mode of reproduction and a Wahlund effect caused by an inaccurate sampling method were considered to be the reasons of departure from HWE [12,15,18,19]. In this study, according to the estimate of the null allele frequencies following the EM algorithm, heterozygote deficiency caused by null allele may be the main factor contributing to the departure from HWE. Additionally, inbreeding, the arrhenotokous mode of reproduction and a Wahlund effect should not be ruled out as contributors to the deviation from HWE. These factors may explain the deviation of HWE  (Table 1) suggests that the genetic diversity within each population of T. urticae in China is very low. Considering the demographic and life history traits of T. urticae which tend to live in small patches of inbred individuals [12,20], genetic drift probably contributed to the loss of diversity of the mite populations. However, the values of the genetic diversity indexes reported in this study are slightly higher than previously reported values for T. urticae [6]. This may be because the latter study used highly inbred laboratory stocks that have low genetic diversity.
AR and H E were significantly greater in the red form mite than in the green form mite ( Table 4), suggesting that the genetic diversity of the red form mites was higher. This may be because the carmine spider mite is a native species, while the green spider mite is an invasive species, first reported in Beijing in 1983 [10]. As an invasive pest, the green form is often established in a new area from only a handful of introduced individuals (founders), which carry only a portion of the genetic diversity that was present in the source population. This would explain why the diversity of the green spider mite is lower than that of the carmine spider mite.
The pairwise F ST values, Cavalli-Sforza & Edwards' chord distances, Bayesian clustering and AMOVA structure suggest that T. urticae has a high level of genetic structuring. Although there is a strong subdivision of genetic structure in Chinese populations of T. urticae, low gene flow still occurred among geographical populations ( Figure 5B). This may be due to the long distance dispersal facilitated by multiple dispersal mechanisms of T. urticae. Five clades likely exist in China. The red form mite populations were clustered into four clades which agree well with their geographical distribution. The limitation of gene flow between populations may play a role in the high genetic differentiation. Our results suggest that the Yangtze River doesn't act as a substantial barrier to gene flow in the red form mite populations, which was contrary to be the case of the rice stem borer, Chilo suppressalis [21]. However some studies also showed little or no evidence that the Yangtze River limits gene flow [22,23]. The different characteristics of different dispersal mechanisms of organisms may be responsible for this difference. For T. urticae, geographic distance may be the prevailing factor for the limited gene flow as evidenced by the IBD analysis. IBD analysis revealed that geographic distance has a strong effect on the population structure of the red form mite (R 2 = 0.1129, P = 0.0050) and no effect on the green form mite (R 2 = 0.001, P = 0.42800).
The limitation of gene flow associated with geographical distance is in agreement with the results of previous studies of spider mites [14,24,25].
The declining genetic diversity of the red form mite with increasing latitude may be because the northern populations have fewer generations per year. In southern China, which has a hot-humid climate, the spider mite has more than 20 generations per year whereas in northern China, which has a cold-arid climate, the spider mite has 12-15 generations per year. This would result in a higher mutation rate in the southern populations. Mutation is one of the key sources of genetic diversity. So the southern populations would be expected in turn to have a higher genetic diversity than the northern populations. However, only seven populations of green mites were considered in this study, and all of the seven populations were located in the north of China. Therefore, scarce populations of green mites scattered over a much lower latitudinal gradient than the populations of red mites might lead to the statistical bias.
Even though T. urticae and T. cinnabarinus have significantly different body colours and are reproductively isolated in China, our genetic data don't support the separation of the two forms of spider mite into two species. The level of genetic differentiation between the two colour forms is similar to that examined between geographically separated populations. Bayesian clustering cannot separate the red form mite from the green form mite when K = 2. AMOVA analysis also revealed that there is more variations among the 5 groups inferred by STRUC-TURE software than between the two colour forms of spider mite. An attempt to distinguish the two forms by phylogenetic analysis of their endosymbiont Wolbachia was unsuccessful, possibly because of a tendency of Wolbachia to co-evolve with their hosts [26]. Although Li et al. reported that they could differentiate the red and green mites using three microsatellite markers, three red form mite populations were still mixed with green form mite populations in their results [6]. Based on part of mitochondrial (cytochrome c oxidase subunit 1) and nuclear (internal transcribed spacer 1 and 2 of ribosomal RNA gene) sequences, the relationship between the two species also cannot be well resolved [27]. The completely incompatible between the two forms of mite may be because the two forms were isolated from each other for a long time before the green mite form was introduced to China. A long period of isolation leads to complete reproductive incompatibility. Although T. urticae and T. cinnabarinus are reproductively incompatible, there may be no intrinsic difference in their DNA compositions. The morphological difference between the two forms of mite may be due to epigenetics. In this case, different gene expressions between the two forms of spider mite caused by DNA methylation, histone deacylation

Conclusions
Spider mites possess low levels of genetic diversity, limit gene flow between populations and significant IBD (isolation by distance) effect. These factors in turn contribute to the strong subdivision of genetic structure. Due to the founder effect, the genetic diversity of the invasive green form of spider mite is lower than that of the red form. Fewer generations per year at higher latitude lead to the declining genetic diversity of the red form with increasing latitude. In addition, the population genetic structures of the two forms of mites do not support their separation into two species. The morphological difference between the two forms of mites may be the result of epigenetic effects.

Mite sampling and DNA extraction
During the summer of 2008, we collected a total of 1055 adult females of the two-spotted spider mite (the green form from 7 regions, and the red form from 18 regions) at 25 sites in China. Figure 6 provides information about the sampling localities. The host plants and the number of analysed mites are summarised in Table 1. At each locality, every population was sampled by randomly collecting 40-50 adults from 20 plants in a 5-× 5-m square. Each locality was > 300 km away from the others. They were brought to the laboratory for identification based on aedeagus morphology observed with a V8 microscope (Carl Zeiss, Jena, Germany). Only T. urticae mites were subjected to further analyses. Total DNA was extracted from adult female mites according to the method of Gomi et al. [28].

Microsatellite loci isolation and characterization
Two of the eight loci used in this study (TECI 04 and TECI 08) were isolated by us from a genomic DNA library using a suppression -PCR procedure described by Lian et al. [29]. Briefly, DNA extracted from a pool of mites was digested with a blunt -end restriction enzyme, EcoRV, and the restricted fragments were then ligated with a specific blunt adaptor (consisting of a 48-mer: 5'GTAATAC-GATTCACTATAGGGCACGCGTGGTCGACGGCCCG GGCTGGT3' and 8-mer with the 3'-end capped by an amino residue: 5 ACCAGCCC-NH 2 3') by use of a DNA ligation kit (Takara Shuzo, Japan).
Fragments were amplified from an EcoRV DNA library using compound SSR primer (AC) 6 (AG) 5 or (AG) 6 (AC) 5 and an adaptor primer AP2 (5'CTATAGGGCACGCG TGGT3'). The amplified fragments were integrated into the plasmids using a pT7 Blue Perfectly Blunt Cloning Kit (Novagen) and the plasmids were transferred into Escherichia coli according to the manufacturer's instructions. The inserted fragment lengths were checked by 1.5% agarose gel electrophoresis. Amplified fragments between 300 and 800 base pairs were selectively sequenced directly using a Thermo Sequence Pre -mixed Cycle Sequencing Kit (Amersham Biosciences, USA) with a Texas Red-labeled T7 primer (Sigma-Aldrich, Japan) in an SQ-5500E sequencer (Hitachi). For each fragment flanking (AC) 6 (AG)n or (AG) 6 (AC)n compound SSR sequences at one end, a  specific primer (IP1) was designed from the sequence using Primer 5.0 (http://www.premierbiosoft.com). The primer pairs of IP1 and compound SSR primer were used as a compound SSR marker.
To examine the effectiveness of primer pairs designed as compound SSR markers, 40T. urticae adult females sampled from 21 provinces of China were used for the template DNA extraction according to protocols described by Gomi et al. [28]. PCR amplification was conducted with a Gene Amp PCR System 9700 (Applied Biosystems). Five microliters of the reaction mixture contained 0.5 μL template DNA, 0.2 mmol -1 of each dNTP, 1 × PCR buffer (Mg 2+ free, Applied Biosystems, USA), 2.5 mmol L -1 Mg 2+ , 0.125 U of Ampli Taq Gold (Applied Biosystems), and 0.5 μmol L -1 of each IP1 and a Texas Red-labeled compound SSR primer ((AC) 7 (AG) 3 or (AG) 7 (AC) 3 ). Reactions were preceded by a 9-min denaturation step at 94°C and were cycled 40 times with 30s at 94°C, 30s at 55°C, and 1 min at 72°C, followed by a final 5 min extension step at 72°C. The reaction products were electrophoresed on 6% Long Ranger sequencing gel (FMC BioProducts, ME, USA) using an SQ -5500E sequencer. Electrophoretic patterns were analysed by Fraglys v.3 (Hitachi).
To minimize the bias of the genetic diversity statistics induced by null alleles, genotypes were corrected according to the null frequencies estimated by the EM algorithm of Dempster et al. (1977) [37] implemented in the program FREENA (http://www.montpellier.inra.fr/URLB/ ) [38]. The estimated false homozygous genotypes XX caused by null alleles were systematically changed to X999 (INA method) [38][39][40]. The corrected data set was used to rectify H O , H E , F IS statistics and HWE test using the above methods. In order to understand whether genetic variation within populations is correlated with geographical gradients, Pearson correlations between the statistics of variation (AR and H E ) and geographic co-ordinates (latitude and longitude) for each population were analysed.
Stepwise regression analysis of AR and H E in relation to the two independent variables (latitude and longitude) were further assessed separately. Both analyses were conducted using SPSS 13.0 for Windows [41].
Pairwise F ST values for each population comparison were calculated with FSTAT. The ENA method was also used to obtain the unbiased pairwise F ST values (F ST {ENA} ) using the FREENA program [38]. To detect isolation-by-distance effects, we compared F ST /(1-F ST ) matrix and F ST {ENA} /(1-F ST {ENA} ) matrix with a geographic distance matrix (ln Km) using the Mantel test, with significance tests performed over 1000 permutations [42]. The test was implemented in GENEPOP [35].
Phylogenetic relationships among populations were estimated by constructing a neighbour-joining (NJ) tree [43] based on DCE distance [44] using the PHYLIP 3.6c package [45]. One thousand distance matrices from resampled data sets bootstrapped over allele frequencies were created using the SEQBOOT subroutine in PHY-LIP3.6c. Then the GENDIST subroutine was used to calculate the correlated genetic distances. The distances matrices were used to construct NJ trees using the NJ tree NEIGHBOUR subroutine in PHYLIP 3.6c. The input order was randomised to ensure the final tree topology was not dependent on the sample entry order. The CONSENSE subroutine within PHYLIP produced a consensus NJ tree that provided estimates of robustness at each node based on the bootstrapping of the gene frequencies. Consequently, we constructed a visual tree with TREEVIEW v.1.6.6 [46]. Because the presence of null alleles may confound the calculation of genetic distance, the corrected data using the INA method implemented in FREENA was also used to constructing a NJ tree using the above methods.
The Bayesian clustering method was also used to elucidate the genetic structure among populations using STRUCTURE v. 2.2 [47]. The model applied in the analysis assumes the existence of K clusters. We took advantage of an admixture ancestry model under the correlated allele frequency model. The Markov chain Monte Carlo simulation was run 20 times for each value of K (1-11) for 10 6 iterations after a burn-in period of 10 5 . All other parameters were set at their default values. We used the Δ K method of Evanno et al. [48] to choose the most likely value of K. The proportional membership of each cluster was estimated for each individual and each population. Owing to the difficulty of correcting multilocus genotypes precisely, three loci, TUCA12, TUCT17, TUCA72 abounded with null alleles over the 18 population were deleted. The genotype data with the remaining five loci were also used for Bayesian clustering to exclude the effect of null alleles. After the three loci were deleted, 79 of 125 locus-population combinations were found to have no null alleles. The average frequency of null alleles for the remaining five loci was estimated at 0.083 by MICRO-CHECKER software.
Population genetic variance was further analysed by an Analysis of Molecular Variance (AMOVA) [49] performed by the method of Excoffier et al. [49] using ARLEQUIN v. 3.11 [50]. Both the raw genotype data set and the genotype data set which corrected by using the INA method were analysed. Genetic variance was partitioned into three levels: (1) among different groups defined on the basis of phylogenetic clusters, host plants, and mite colour patterns, (2) among populations within groups, and (3) within populations. Significance of fixation indices was tested using a nonparametric permutation approach with 1000 permutations [49], as performed by ARLEQUIN.