The primary methodological advance provided by the approach presented here lies in the ability to develop nuclear markers with specific characteristics tailored to the research question at hand, identify genotyping assays that achieve a balance between efficiency and resolving power prior to large-scale screening, and obtaining three levels of genetic information simultaneously from the same marker. A secondary advance comes from combining the strengths of diverse PCR-based marker development protocols (Background) in a simple five-step procedure that does not require cloning.
Using this approach, we obtained a suite of polymorphic nuclear markers for two undescribed Collembola relatively quickly and cheaply. These markers were informative over fine spatial scales, appeared to be non-coding (with the exception of UcWnt – see Results), were free of appreciable null allele frequencies, and as far as we can determine without pedigreed material, alleles were segregating in a Mendelian fashion. Together with targeted DNA sequencing, SSCP was an effective technique for obtaining genotypic, genic and genealogical information from six nuclear genes for moderate to large population-genetic sample sizes. These loci showed no consistent significant departure from neutrality over all populations, were mostly free of detectable recombination, and were therefore suitable for constructing nuclear gene trees using standard phylogenetic procedures. Indeed, the datasets obtained here are well-suited to variety of emerging coalescent-based statistical phylogeographic analyses [34–36], and NCA.
The role of SSCP in development and application of three-tiered nuclear markers
Single-stranded conformation polymorphism was effective for physically isolating nuclear allele haplotypes from diploid tissues. While cloning of PCR products also provides a means of separating size-invariable alleles, this procedure under many jurisdictions requires a dedicated Physical Containment level 1 laboratory and formal permission from gene technology regulators, is labor-intensive, and expensive (e.g. requires cloning vectors and competent bacteria). Further, to achieve a moderate level of certainty regarding genotype assignment, at least four clones per PCR product are usually sequenced , but the number required to detect weakly amplifying alleles may be considerably greater, and thus prohibitively expensive . An additional drawback is that cloned sequences can have a greater propensity to reveal artifacts such as Taq DNA polymerase error and PCR recombinants .
The SSCP procedure provides researchers with the advantage of being able to identify putative sequence-variable alleles directly from autoradiograph phenotypes. Thus, only strategically-selected bands need to be sequenced. Further, given the high sensitivity of radioisotope-based methods for viewing low-concentration DNA , weakly amplifying alleles can be detected with greater efficiency than with cloning (for laboratories without access to radioisotope, silver-staining should produce comparable SSCP results, and also permits bands to be excised and reamplified in PCR ). Similarly, SSCP pre-screening provides valuable information on levels of polymorphism at a newly-developed locus, enabling informed decisions about its suitability for addressing the research question, and for identifying cost- and time-efficient genotyping assays.
While we found SSCP to be a valuable technique for genotyping population samples on the basis of DNA sequence variation at multiple nuclear loci, we must introduce a note of caution. When there is a large number of individuals (c. >300) to be assayed, and/or a large number of alleles (c. >15) at a locus, a substantial time commitment may be required. The cost of sequencing can increase considerably when there are many unique gel phenotypes (putative genotypes) present on each autoradiograph, or single-stranded DNA (ssDNA) adopts multiple conformations. Running individuals that are likely to share alleles on the same gel alleviates the former problem, while the latter situation seems to arise mostly when primers contain degenerate nucleotide positions, and where possible, we recommend avoiding their use for large-scale screening. Finally, there may be rare occasions where closely-related alleles produce indistinguishable SSCP gel phenotypes. In these cases we have used a RFLP assay to distinguish known alleles (Table 3). In the absence of a diagnostic restriction site, it may be necessary to design primers that amplify alleles singularly.
Identifying contemporary spatial-genetic patterns and estimating population relationships
Joint analysis of genotypic data revealed population substructuring over fine spatial scales in both Collembola species, consistent with expectations for ecologically-specialized, low-mobility animals. Although genotypic data from multiple loci proved useful for describing contemporary spatial-genetic patterns, it offered little information on the degree of divergence among populations. Having first objectively defined populations a posteriori in a spatially explicit manner, we were then able to quantify levels of within-population genetic diversity and estimate population relationships using genic data.
Kalinowski  proposed that the total number of independent alleles (i.e. number of alleles at a locus minus one, summed across loci) is a good indicator of precision in genetic distance estimates. Accordingly, loci with many alleles are more efficient in producing good estimates. Nuclear markers developed using the approaches presented here and genotyped using SSCP coupled with targeted DNA sequencing were highly polymorphic (mean 15.5 alleles per locus, Table 3), and so are well-suited to genetic distance-based analyses. In both species, geographically proximate populations tended to be genetically more similar to one another (c.f. distant populations), and population divergences seem to have occurred on different timescales throughout the evolutionary history of these animals at Tallaganda. However, when genetic distances between populations are small, either recent divergence with zero gene flow (isolation model), or ancient divergence with low ongoing gene flow (migration model) can represent equally plausible scenarios . While genic data are useful for quantifying levels of genetic diversity and estimating population relationships, they are unable to provide a full picture that integrates both separation time and gene flow . This can be addressed by analysis of DNA sequence data.
Nuclear gene phylogeography
Because models of isolation versus migration can lead to similar gene tree topologies , the present datasets will be analyzed using likelihood methods, and results presented elsewhere. Nonetheless, phylogenetic analyses demonstrated that all of the new Collembola nuclear loci assayed using SSCP coupled with targeted DNA sequencing contain phylogeographic signal that can be exploited. Interestingly, in both species, the EF-1α locus – the only one known to be an intron of a functional gene – had the lowest range of sequence divergences yet the strongest phylogeographic signal (Figure 3, Table 2), possibly indicating purifying selection not detected by Tajima's D neutrality tests (Table 6). Indeed, intron polymorphisms are increasingly being recognized as important in gene regulation , and it has recently been postulated that the majority of intronic DNA in the genome is likely to be evolving under considerable selective constraint .
Although introns offered the greatest phylogeographic resolution for both Collembolons, the value of anonymous nuclear sequence-markers should not be understated. Because different loci can vary widely in their histories [23, 24], sampling additional individuals (after a certain point) is far less informative than sampling additional unlinked loci [34, 36]. In the present study, comparative analysis of genes within species confirmed that stochastic variance among loci can be considerable. Although there were some notable differences in spatial structuring, Acanthanura n. sp. loci Uc180 and Uc3 both produced star-shaped phylogenies, consistent with expectations for an exponentially growing population . In contrast, UcEF-1α showed marked phylogeographic structuring, with multiple putative 'ancestral' allele haplotypes occupying central positions in the cladogram, each associated with a series of closely-related descendants. This case study illustrates that multiple nuclear sequence markers have the potential to radically alter population inferences.