- Research article
- Open Access
SNP identification, verification, and utility for population genetics in a non-model genus
© Williams et al; licensee BioMed Central Ltd. 2010
- Received: 5 February 2010
- Accepted: 30 April 2010
- Published: 30 April 2010
By targeting SNPs contained in both coding and non-coding areas of the genome, we are able to identify genetic differences and characterize genome-wide patterns of variation among individuals, populations and species. We investigated the utility of 454 sequencing and MassARRAY genotyping for population genetics in natural populations of the teleost, Fundulus heteroclitus as well as closely related Fundulus species (F. grandis, F. majalis and F. similis).
We used 454 pyrosequencing and MassARRAY genotyping technology to identify and type 458 genome-wide SNPs and determine genetic differentiation within and between populations and species of Fundulus. Specifically, pyrosequencing identified 96 putative SNPs across coding and non-coding regions of the F. heteroclitus genome: 88.8% were verified as true SNPs with MassARRAY. Additionally, putative SNPs identified in F. heteroclitus EST sequences were verified in most (86.5%) F. heteroclitus individuals; fewer were genotyped in F. grandis (74.4%), F. majalis (72.9%), and F. similis (60.7%) individuals. SNPs were polymorphic and showed latitudinal clinal variation separating northern and southern populations and established isolation by distance in F. heteroclitus populations. In F. grandis, SNPs were less polymorphic but still established isolation by distance. Markers differentiated species and populations.
In total, these approaches were used to quickly determine differences within the Fundulus genome and provide markers for population genetic studies.
- Principle Component Analysis
- Allelic Richness
- Southern Population
- Northern Population
- Putative SNPs
High throughput sequencing and genotyping has become increasingly faster, less expensive and more accurate. In recent years this has lead to the establishment of myriad data sets ranging from increased coverage of variation in the human genome at the individual level [1–5] to the sequencing of non-model prokaryotic and eukaryotic genomes and transcriptomes [6–11]. For many organisms sequencing of entire genomes is still unattained, but smaller, more targeted portions of the genome can be easily sequenced and genotyped. Such data can provide genome-wide sequence information which can be used to characterize population and selection pressure parameters as well as provide evolutionary insights that are broadly applicable .
One non-model genus, Fundulus, includes closely related species that range in physiology, environmental and habitat preference, and geographic locales; Fundulus heteroclitus and Fundulus majalis inhabit the Atlantic coast, and Fundulus grandis and Fundulus similis inhabit the Gulf Coast. Many Fundulus species and/or populations have extensive euryhaline capabilities, respond well to varying ranges of hypoxia [13–15], live along a steep thermocline, and have adapted to extremely polluted areas . A variety of studies have investigated the underlying genetic basis of this teleosts' phenotypic plasticity. While some of the transcriptome is known for F. heteroclitus [17–27] much of the genome-wide variation within and between populations and species for this genus is relatively unknown.
Establishing a set of genetic markers, which can be used to assess regions of the genome involved in local adaptation and in speciation is important to understand fundamental similarities and differences between populations and species of Fundulus. Once markers are established they can be further studied to look for signatures of selection to any number of evolutionary forces (e.g., pollution, hypoxia, salinity, temperature). A few studies have established genetic differences between populations of F. heteroclitus mainly with respect to phylogeographic constraints [28, 29] or selection [30–38]. These studies used microsatellite, mitochondrial DNA, and AFLP analyses as well as targeted gene approaches. Single nucleotide polymorphisms (SNPs) are a useful starting point to scan large and disparate regions of the genome due to their abundance in both coding and non-coding regions, their co-dominant nature, and lack of ambiguity.
SNPs have been used to establish differences between individuals , populations [40–42] and species [43, 44]. They also are useful markers for propensity to disease [45–47], disease states , and evidence of the genetic basis of adaptation [49–52]. In vertebrates, a SNP occurs on average every 100 to 1000 base pairs and often is in linkage disequilibrium with many other SNPs along the chromosome, forming strong haplotypes, which can be easily identified . Unfortunately, SNP resources are not readily available in the majority of non-model species lacking genomic resources. With this in mind, we set out to establish a set of SNP markers to identify differences between Fundulus populations and species.
Sample Collection and Extraction
Genomic DNA from spleen and testes was extracted by phenol and chloroform as described in Wirgin et al. , and DNA was resuspended in 50 μL 0.1× TE buffer. Genomic DNAs from fin clips were extracted using a modified version of Aljanabi and Martinez  and DNA was resuspended in 50 μL 0.1× TE buffer. This experiment was performed according to an approved Institutional Animal Care and Use Committee at North Carolina State University.
Adapters and primers used in the amplification of genomic DNA.
BspEI (5' to 3')
EcoRI (5' to 3')
EcoRI (5' to 3')
BspEI (5' to 3')
EcoRI (5' to 3')
BspEI (5' to 3')
EcoRI (5' to 3')
BspEI (5' to 3')
Preselective PCR reactions with primers specific to adaptors (Table 1) were performed in a total volume of 25 μL containing 2 μL of diluted (1:10 in 0.1× Tris-EDTA buffer) ligation product with EcoRI primer (Integrated DNA Technologies; 10 pmol), BspE1 primer (Integrated DNA Technologies; 10 pmol) and 1 U Taq. PCR conditions were 20 cycles of 94°C for 10 sec, 49° for 30 sec, and 72°C for one min. Following the preselective amplification, a selective amplification was carried out to decrease the number of fragments amplified in each individual to approximately 200 by extending the primer on the 3' end. Preselective PCR products were diluted (1:10) and 2 μL of diluted product was amplified with primers (Table 1) to EcoRI+ AAG (Integrated DNA Technologies; 10 pmol) and BspEI +C (Integrated DNA Technologies; 10 pmol) with 1 U Taq in a 25 μL total volume. PCR conditions in the first cycle were 94°C for 10 sec, 65°C for 30 sec, and 72°C for one minute with the annealing temperature reduced by 0.5°C for 20 cycles, then 25 cycles of 94°C for 10 sec, 55°C for 30 sec, and 72°C for one minute.
Emulsion PCR was carried out on PCR products as described . Amplification of the PCR product on the bead was controlled for by quantifying and calculating the size of the amplicon pool using a Bioanalyzer 2100 so that there was a minimum of 2 × 106 copies of DNA that ranged in size from 100 to 700 base pairs. Subsequent products were sequenced on a Roche/454 Life Sciences GS FLX Sequencer at the University of South Carolina's Environmental Genomics Core Facility. The PicoTiter plate was subdivided into eight regions with an expectation of 30,000 reads per region .
Assembly of pyrosequencing sequences and SNP Detection
Sequences were trimmed of their barcodes. All 626 sequences with at least one ambiguous base were removed since the presence of even a single ambiguous base is an effective indicator of low-quality sequence . Because shorter than expected read lengths also correlate strongly with incorrect reads , another three percent of the sequences (whose lengths were smaller than 100 bp) were removed. The remaining reads were aligned using CAP3 . Quality scores were rescaled to be comparable to the usual Phred Score using ARACHNE .
SNPs were called at both the individual level and population level. At the individual level, SNPs were called using both a Bayesian method and a likelihood ratio test (LRT) method. For the Bayesian method, 10-4 was used as the prior for the mutation rate . At the population level, for each locus on the contig, we simulated the error model and marked a locus as a potential SNP if it had a larger number of second alleles in comparison to the critical value from the error model. Furthermore, a potential SNP site had to have at least three individuals sequenced to 2× at that locus unless another potential SNP site was within five basepairs or over 90% of the individuals had been classified as heterozygous at the individual level. This was done to minimize the rate of false positives caused by homologs.
Bayesian and LRT model for SNP calling at individual level
For the Bayesian model, for each contig, Prior = 1 × 10-4 represents the mutation rate; N represents the total number of unique mapping loci with multiple allelic types; A i and a i represent, respectively, the major and minor alleles at locus i; N i represents the total number of alleles observed for locus i, and Y j is the type of the j th allele copy among these N i alleles where j = 0 ⋯ N i ; finally, e j is the probability of error of the j th allele where the error probability is computed as and where Q is the corresponding quality score after rescaling.
Based on the posterior probabilities from above, we classified each of these N loci as homozygous or heterozygous exclusively. If a locus was classified as heterozygous, it was further tested using a likelihood ratio test (LRT) as follows:
where X j stands for the true allele that we should have observed. For each Y j , we have an error probability of e j associated with it.
Where I j = 1 if Y j = A i ; and I j = 0 if Y j = a i
The LRT was performed with the hypothesis of H O : p = 0.5 versus H a : p >0.5 and -2 × LRT ~ χ2(1).
Error model simulating
In order to call SNPs at the population level, we simulated the error model for each locus with multiple allelic types; we assumed that a particular locus was homozygous with major allele A i and randomly simulated N i number of alleles copies to be A i or any of the other three allele types from a uniform distribution with probability (1 - e j ) and e j respectively. We repeated this process 10,000 times and recorded the different numbers of second alleles found in the simulation. The critical value was chosen as the number of second alleles with a right-side p-value of 0.001.
Validation of SNPs
Multiplex assays targeting 458 SNPs in 250 F. heteroclitus individuals, 90 F. grandis individuals, 23 F. majalis individuals, and 21 F. similis individuals were attempted using the Sequenom MassARRAY technology. These consisted of 81 putative SNPs identified by the F. heteroclitus pyrosequencing, 350 putative SNPs previously identified in F. heteroclitus ESTs , and 27 putative SNPs from 22 genes containing, amongst others, SNPs in the aryl hydrocarbon receptor , lactate dehydrogenase B , and the proximal promoter of cytochrome P4501A (unpublished). Assays were designed using the MassARRAY Assay Design Software with the goal of maximizing multiplexing of 36 SNPs per well (Sequenom, San Diego, CA, USA). Only SNPs where 70 base pairs were annotated on either side of the polymorphism were included in the study. There were 14 SNPs previously identified with 454 pyrosequencing where this criterion was not met. If multiple SNPs were proximal (< 70 base pairs) to one another, one SNP was chosen and the other(s) was translated into a degenerate nucleotide (e.g., K = G or T). Reaction conditions were performed by iPLEX chemistry as recommended by Sequenom across 13 plates at the University of Minnesota's BioMedical Genomics Center. SNP genotypes were called using the Sequenom System Typer 4.0 Analysis package. This software uses a three-parameter model to calculate the significance of each putative genotype. Based on the relative significance, a final genotype is called and assigned a particular name (e.g., conservative, moderate, aggressive, user call). Non-calls also were noted (e.g., low probability, bad spectrum).
Analysis of Genotype Data
Arlequin v.3.11 was used to calculate genetic diversity among populations (of F. heteroclitus and F. grandis) by calculating the percentage of polymorphic SNPs (P O ), observed (H O ) and expected heterozygosity (H E ), and the within-population fixation index (F) . Fixation index deviations from zero were tested by 10,000 permutations of alleles between individuals. Hardy-Weinberg equilibrium also was tested in each population. An analysis of molecular variance (AMOVA) was performed to calculate the distribution of variance within populations, between North and South regions, and between F. heteroclitus populations within North and South regions. For F. grandis, the AMOVA was performed to calculate the distribution of variance within populations as well as between populations longitudinally along the Gulf of Mexico. Since SNPs were initially identified from F. heteroclitus sequence data, a maximum of 5% missing data was used as a parameter for calculations involving F. heteroclitus and 10% for all others.
A Mantel test was performed to assess the assumption of isolation by distance using XLSTAT 2009 for F. heteroclitus and F. grandis.
STRUCTURE v.2.2 [67, 68] was used to estimate the number of populations (K) in F. heteroclitus, F. grandis, F. majalis and F. similis along both the Western Atlantic and the Gulf of Mexico and to assign individuals to these populations. The Monte Carlo Markov Chain was run for 105 iterations following a burn-in period of 105 iterations for K = 1 to 14 using the correlated allele frequencies model and assumed admixture. Distruct v. 1.1  was used to generate bar plots to depict classifications with the highest probability under the model. JMP Genomics 3.2 for SAS 9.1.3 conducted principal component analysis on all samples to establish population structure.
GS FLX Sequencing and Assembly
A total of 111,001 reads were obtained in one run of the GS FLX instrument producing 5,346,445 total bases of sequence (average read length of 218 bases) with 99.98% of bases having a quality score of 20 or greater. Across the eight regions of the plate, there were on average 1,982 reads per individual. The third barcode produced many less reads per region (<1,000) amongst all regions. All other barcodes performed very similarly with respect to the number of reads per individual across regions. Only 46% of the number of expected reads (111,001 instead of 240,000) were obtained from sequencing. Prior to sequencing, the amplification success of loci on the beads was checked for quality using a Bioanalyzer 2100, and all samples passed. However, three of the eight regions produced half the expected number of reads and a fourth region produced only 15% the expected number of reads. This indicated local problems in sequencing with respect to particular regions and the samples in those regions rather than the plate as a whole. All control beads passed the filter control with an average percentage of 90% across all regions, whereas the percentage of samples passing the filter control varied between regions and averaged 36%: regions with fewer than expected reads had fewer samples passing the filter control. Two regions had very high failure rates due to mixed samples, indicating more than one amplicon per bead.
Genotyping success of SNP markers using the MASSARRAY multiplex assay.
Number of SNPs
Percentage of SNPs
SNPs called in >95% of F. heteroclitus individuals
SNPs called in <80% of all individuals
SNPs called in >90% but <95% of all individuals
Monomorphic SNPs called in >95% of all individuals
Polymorphic SNPs called in >95% of all individuals
SNPs called in <90% of all individuals identified in 454
SNPs called in >90% of all individuals identified in 454
SNPs which were identified by Sequenom software as low probability in greater than 50% of all individuals were removed (17 SNPs in total). An additional 20 SNPs were excluded from analyses due to their excessive heterozygosity across individuals and populations of F. heteroclitus. These SNPs may represent segmental duplication where the two duplicate regions are identical, except that a SNP has been driven to high frequency or become fixed in one of the duplicates.
Genetic parameters of sampled populations in two species of Fundulus.
% Departure from HWE
New Bedford Harbor
% Departure from HWE
SNPs identified via 454 sequencing did not have genetic parameters that differed from SNPs identified in ESTs with the exception of Hardy-Weinberg equilibrium. 454-derived SNPs had a higher percentage of SNPs not in Hardy-Weinberg equilibrium due to a lack of heterozygosity (22% versus 9%).
Many SNP loci (60%) in F. heteroclitus had a frequency greater than 0.10 and were considered common SNPs (Additional file 1). In contrast, 90% of SNPs in F. grandis had low minor allele frequencies below 0.10.
In F. heteroclitus, AMOVA showed that most of the variation was distributed within populations (59.05%), but another large proportion of variation (31.1%) was distributed among northern and southern regions. The remaining 9.85% of variation was explained by differences among populations within regions. In F. grandis, most of the variation was distributed within populations (82.4%), and a smaller proportion (17.6%) of variation was distributed longitudinally between populations across the Gulf of Mexico.
A Mantel test showed significant isolation by distance among F. heteroclitus populations (p < 0.001) and F. grandis populations (p = 0.032).
We used high throughput sequencing and genotyping technology to identify and verify SNP markers in four non-model species within the Fundulus genus. Genotype data sharply differentiated northern and southern populations of F. heteroclitus as well as other species in this genus (F. grandis, F. majalis, and F. similis). Within the species where SNPs were originally annotated, most can be successfully verified and used to study population structure as well as the role and outcome of selection forces on a genome-wide scale.
Using the 454 FLX pyrosequencing system, we observed 111,001 reads yielding an average of 22× coverage across 1,464 contigs. Read lengths and quality scores were similar to many other studies using the 454 FLX system to sequence uncharacterized genomes [8, 70], but we identified fewer SNPs. Two-hundred and sixty-one SNPs were identified in 96 of these contigs (81 were further verified with the Sequenom MassARRAY platform). The percentage of contigs containing SNPs did differ between experiments: we obtained 0.07% of contigs containing SNPs while pyrosequencing of Eucalyptus ESTs identified 0.05% of contigs containing SNPs  and pyrosequencing of size selected, genomic DNA from swine identified 11.4% of contigs contained SNPs .
Our 454 pyrosequencing of genomic DNA was originally designed to both discover and genotype SNPs within and among populations of F. heteroclitus. Thus, we attempted to perform genome reduction with selective PCR reactions to approximately 200 loci that could be sequenced in 10 populations of 8 individuals. With 30,000 reads per one-eighth of a 454 sequencing plate, each region would have 15× coverage per individual or 980× coverage across all populations, enabling accurate genotype calls for most individuals. However, preselective amplication was not perfect, and many more than 200 loci were sequenced; most amplified only a single time in a single individual (these singlets therefore were not useful for variant detection). Furthermore, we obtained only 46% of the expected number of reads. In the end, these problems led to the inability to directly call individual genotypes. We were hoping to both identify SNPs and genotype individuals in a single step, but a more successful approach (as evidenced by the swine group ) is to make reduced representation libraries from many pooled individuals for SNP discovery followed by individual genotyping. Because a pool of individuals is used, this approach identifies few singlets and thus enhances the number of reads per contig. Furthermore, improvements in both the number and length of reads using the Titanium series FLX 454 system compared to the original FLX system we used will increase the number of identified SNPs.
To increase our ability to measure population genetic parameters within and among populations, we verified SNPs identified through 454 sequencing and additional SNPs annotated from F. heteroclitus cDNAs using the MassARRAY system. Similar percentages of 454 pyrosequencing derived SNPs and SNPs identified from ESTs were verified (80% and 83%, respectively). Of the 458 putative SNPs, 379 (82.75%) were polymorphic, but only 264 had a greater than 90% successful call rate among all individuals. Among F. heteroclitus, most SNPs amplified (61.3% were called in >95% of individuals) indicating that differences in amplification rate between species led to the lower overall call rate. In white spruce, 91% of SNPs verified with the Illumina SNP bead array platform [71, 72] were true. Comparable to F. heteroclitus, 70% of SNPs in spruce were called in greater than 95% of individuals . Overall, verification of SNPs was powerful in providing information over many markers and individuals and was able to provide data to determine differences within populations, between populations and between species.
Species differentiation was demonstrated using principle component analysis (PCA) as well as STRUCTURE analysis. Both analyses showed separation between F. heteroclitus, F. grandis and F. majalis and similis as well as population structure within F. heteroclitus (Figure 4). These analyses provided the most resolution (even among distinguishing populations) in F. heteroclitus because the SNPs were originally identified in this species (i.e., due to an ascertainment bias). PCA and STRUCTURE did not differentiate sister species, F. similis and F. majalis, from each other or establish population structure within these species. Small sample sizes (1 to 10 individuals per population), high levels of monomorphism (average of 28% of all SNPs), and the fact that only 10% of SNP alleles differed between these two species, decreased the power to detect such differences when analyzed in conjunction with F. heteroclitus and F. grandis. Population structure also was masked in F. grandis when data was analyzed with other species. However, when F. grandis individuals were analyzed separately, they also showed distinct population structure (data not shown). One other study has reported multiple fixed differences in mitochondrial sequences between F. heteroclitus and F. grandis , but no other study to date has evaluated differences at many loci between all four species used in this study.
Within F. heteroclitus and F. grandis species, within-population fixation indices (FIS, averaged across all loci) ranged from 0.09 to 0.32. Among F. heteroclitus, all populations had an overall significant deficiency of heterozygotes indicated by positive FIS values. In these populations, approximately 10% of loci had similarly very large FIS values (>0.5) across populations causing the skew in the average FIS value for each population. Within a population, these loci were predominately homozygous for one allele with a complete absence of the heterozygote and one or a few individuals homozygous for the alternative allele. The loci which presented this pattern were called conservatively at both alleles by Sequenom software across all individuals indicating that genotyping error was not the main reason for this pattern.
Furthermore, all northern populations were predominately homozygous for one allele and all southern populations were predominately homozygous for the alternative allele indicating strong demographic patterns in the data. The same demographic pattern was not found in F. grandis. Among F. grandis populations, most (70%) SNPs with high FIS values were different between populations. This is in contrast to F. heteroclitus populations where loci with high FIS values were shared across populations. Within any one F. grandis population, one allele was predominant as a homozygote with one or a few individuals with the alternative homozygote. The most parsimonious explanation is that there is undetected substructure.
SNPs in Hardy-Weinberg were shown to be moderately polymorphic (average of 60%) in F. heteroclitus. In F. grandis, SNPs were shown to lack polymorphism (7.18%). The higher percentage of monomorphic loci in F. grandis likely is due to ascertainment bias in SNP discovery caused by only using F. heteroclitus populations. Many of the monomorphic loci (24%) represent fixed differences between F. heteroclitus and F. grandis. Thus, while SNP markers developed in F. heteroclitus are not necessarily polymorphic in other Fundulus species, they still can be used to differentiate F. heteroclitus from other species.
Among F. heteroclitus populations, genotype data revealed strong latitudinal clines between the Northern and Southern F. heteroclitus populations. PCA, STRUCTURE, FST values, and the isolation by distance test identified that individuals from Northern populations (above 40-41°N) were distinct from Southern populations. This split is centered around the southern-most extent of the Atlantic coastal advancement during the late Pleistocene . Specifically, observed heterozygosity and allelic richness across all loci is significantly lower (p = 0.043, p = 0.042, respectively) in the north than in the south. These differences have been shown previously in morphological features  numerous allozyme loci [34–36, 75] and microsatellites . The larger historical population size of F. heteroclitus in the south  would maintain greater heterozygosity and allelic richness at shared loci; in the north, where population sizes are smaller, loci have a higher probability of becoming fixed.
Four STRUCTURE clusters encompass the six northern populations while only two clusters encompass the five southern populations (Figure 5A). Separate northern clusters may be driven by smaller population sizes in which drift is greater. When genetic drift has a larger effect it becomes easier to distinguish populations because the average difference in allele frequencies of a marker in different populations will be greater. This is illustrated by a larger average FST of 0.20 among northern populations in comparison to that of an average FST of 0.10 among southern populations. This statistic is also evident for the north and south split, where populations from respective regions had an extremely high FST value of 0.44 when compared against one another. Similar genetic divergence has been reported for F. heteroclitus using microsatellites (0.196 among northern populations, 0.117 among southern populations and 0.330 for the two most divergent populations, Nova Scotia and Georgia ). Similar demographic patterns have been described in freshwater fish  and marine species such as goby  and blue crab , and, as in Fundulus, these patterns are attributed to Pleistocene events.
A similar latitudinal cline occurs between populations of F. grandis, and a Mantel test shows significant isolation by distance. However, there were no significant differences between either levels of polymorphism or observed heterozygosity along latitude or longitude. Williams et al., 2008 reported significant isolation by distance as well as decreased allelic richness with increasing latitude. In this 2008 study, microsatellites were used, and two additional sites southern to those used in our study were included. Since microsatellites have many more alleles than SNPs and two additional sites were found to have relatively higher allelic richness in comparison to all other sampling sites along the gulf, this may account for the differences found in levels of polymorphism.
By targeting SNPs contained in both coding and non-coding areas of the genome, we are able to better understand how evolutionary forces are shaping the Fundulus genome. Similar studies using high throughput methods to sequence SNP markers have been developed in Atlantic cod , white spruce , Eucalyptus , and swine . Like our study, these studies expanded their own species' knowledge base with respect to potential markers for studying evolutionary adaptation (in the case of cod and spruce), genome-wide assessment of diversity (Eucalyptus) or for use in breeding programs (swine)
The authors thank G. Bozinovic and M. Everett for assistance in the collection of samples and D. Crawford for valuable input into methodology. Part of this work was carried out by using the resources of the Computational Biology Service Unit from Cornell University which is partially funded by Microsoft Corporation. Funding was partially provided by NIEHS Training Grant ES525163 award from the Department of Environmental and Molecular Toxicology at North Carolina State University to LMW, NIH RO1 ES011588 to MFO, NSF DEB0948510 to ARB, and NIH R01 HG003229 CDB.
- Bordoni R, Bonnal R, Rizzi E, Carrera P, Benedetti S, Cremonesi L, Stenirri S, Colombo A, Montrasio C, Bonalumi S, et al: Evaluation of human gene variant detection in amplicon pools by the GS-FLX parallel Pyrosequencer. Bmc Genomics. 2008, 9: 464-10.1186/1471-2164-9-464.PubMed CentralView ArticlePubMedGoogle Scholar
- Garber M, Zody MC, Arachchi HM, Berlin A, Gnerre S, Green LM, Lennon N, Nusbaum C: Closing gaps in the human genome using sequencing by synthesis. Genome Biology. 2009, 10: R60-10.1186/gb-2009-10-6-r60.PubMed CentralView ArticlePubMedGoogle Scholar
- Ingman M, Gyllensten U: SNP frequency estimation using massively parallel sequencing of pooled DNA. European Journal of Human Genetics. 2009, 17: 383-386. 10.1038/ejhg.2008.182.PubMed CentralView ArticlePubMedGoogle Scholar
- Turner EH, Lee CL, Ng SB, Nickerson DA, Shendure J: Massively parallel exon capture and library-free resequencing across 16 genomes. Nature Methods. 2009, 6: 315-316. 10.1038/nmeth.f.248.PubMed CentralView ArticlePubMedGoogle Scholar
- Zheng JB, Moorhead M, Weng L, Siddiqui F, Carlton VEH, Ireland JS, Lee L, Peterson J, Wilkins J, Lin S, et al: High-throughput, high-accuracy array-based resequencing. Proceedings of the National Academy of Sciences of the United States of America. 2009, 106: 6712-6717. 10.1073/pnas.0901902106.PubMed CentralView ArticlePubMedGoogle Scholar
- De Schutter K, Lin YC, Tiels P, Van Hecke A, Glinka S, Weber-Lehmann J, Rouze P, de Peer YV, Callewaert N: Genome sequence of the recombinant protein production host Pichia pastoris. Nature Biotechnology. 2009, 27: 561-U104. 10.1038/nbt.1544.View ArticlePubMedGoogle Scholar
- Iacono M, Villa L, Fortini D, Bordoni R, Imperi F, Bonnal RJP, Sicheritz-Ponten T, De Bellis G, Visca P, Cassone A, Carattoli A: Whole-genome pyrosequencing of an epidemic multidrug-resistant Acinetobacter baumannii strain belonging to the European clone II group. Antimicrobial Agents and Chemotherapy. 2008, 52: 2616-2625. 10.1128/AAC.01643-07.PubMed CentralView ArticlePubMedGoogle Scholar
- Novaes E, Drost DR, Farmerie WG, Pappas GJ, Grattapaglia D, Sederoff RR, Kirst M: High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome. Bmc Genomics. 2008, 9: 10.1186/1471-2164-9-312.Google Scholar
- Vera JC, Wheat CW, Fescemyer HW, Frilander MJ, Crawford DL, Hanski I, Marden JH: Rapid transcriptome characterization for a nonmodel organism using 454 pyrosequencing. Molecular Ecology. 2008, 17: 1636-1647. 10.1111/j.1365-294X.2008.03666.x.View ArticlePubMedGoogle Scholar
- Bontell IL, Hall N, Ashelford KE, Dubey JP, Boyle JP, Lindh J, Smith JE: Whole genome sequencing of a natural recombinant Toxoplasma gondii strain reveals chromosome sorting and local allelic variants. Genome Biology. 2009, 10: R53-10.1186/gb-2009-10-5-r53.PubMed CentralView ArticlePubMedGoogle Scholar
- Worden AZ, Panaud , Piegu : Green evolution and dynamic adaptations revealed by genomes of the marine picoeukaryotes Micromonas (vol 324, pg 268, 2009). Science. 2009, 325: 147-147.Google Scholar
- Luikart G, England PR, Tallmon D, Jordan S, Taberlet P: The power and promise of population genomics: From genotyping to genome typing. Nature Reviews Genetics. 2003, 4: 981-994. 10.1038/nrg1226.View ArticlePubMedGoogle Scholar
- Diaz RJ: Overview of hypoxia around the world. Journal of Environmental Quality. 2001, 30: 275-281.View ArticlePubMedGoogle Scholar
- Diaz RJ, Rosenberg R: Marine benthic hypoxia: A review of its ecological effects and the behavioural responses of benthic macrofauna. Oceanography and Marine Biology - an Annual Review. 1995, 33: 245-303.Google Scholar
- Smith KJ, Able KW: Dissolved oxygen dynamics in salt marsh pools and its potential impacts on fish assemblages. Marine Ecology-Progress Series. 2003, 258: 223-232. 10.3354/meps258223.View ArticleGoogle Scholar
- Wirgin I, Waldman JR: Resistance to contaminants in North American fish populations. Mutation Research-Fundamental and Molecular Mechanisms of Mutagenesis. 2004, 552: 73-100. 10.1016/j.mrfmmm.2004.06.005.View ArticlePubMedGoogle Scholar
- Oleksiak MF, Churchill GA, Crawford DL: Variation in gene expression within and among natural populations. Nature Genetics. 2002, 32: 261-266. 10.1038/ng983.View ArticlePubMedGoogle Scholar
- Oleksiak MF, Roach JL, Crawford DL: Natural variation in cardiac metabolism and gene expression in Fundulus heteroclitus. Nature Genetics. 2005, 37: 67-72.PubMed CentralPubMedGoogle Scholar
- Peterson JSK, Bain LJ: Differential gene expression in anthracene-exposed mummichogs (Fundulus heteroclitus). Aquatic Toxicology. 2004, 66: 345-355. 10.1016/j.aquatox.2003.10.005.View ArticlePubMedGoogle Scholar
- Meyer JN, Volz DC, Freedman JH, Di Giulio RT: Differential display of hepatic mRNA from killifish (Fundulus heteroclitus) inhabiting a Superfund estuary. Aquatic Toxicology. 2005, 73: 327-341. 10.1016/j.aquatox.2005.03.022.View ArticlePubMedGoogle Scholar
- Whitehead A, Crawford DL: Variation in tissue-specific gene expression among natural populations. Genome Biology. 2005, 6: R13-10.1186/gb-2005-6-2-r13.PubMed CentralView ArticlePubMedGoogle Scholar
- Gonzalez HO, Roling JA, Baldwin WS, Bain LJ: Physiological changes and differential gene expression in mummichogs (Fundulus heteroclitus) exposed to arsenic. Aquatic Toxicology. 2006, 77: 43-52. 10.1016/j.aquatox.2005.10.014.View ArticlePubMedGoogle Scholar
- Roling JA, Bain LJ, Gardea-Torresdey J, Bader J, Baldwin WS: Hexavalent chromium reduces larval growth and alters gene expression in mummichog (Fundulus heteroclitus). Environmental Toxicology and Chemistry. 2006, 25: 2725-2733. 10.1897/05-659R.1.View ArticlePubMedGoogle Scholar
- Whitehead A, Crawford DL: Neutral and adaptive variation in gene expression. Proceedings of the National Academy of Sciences of the United States of America. 2006, 103: 5425-5430. 10.1073/pnas.0507648103.PubMed CentralView ArticlePubMedGoogle Scholar
- Fisher MA, Oleksiak MF: Convergence and divergence in gene expression among natural populations exposed to pollution. Bmc Genomics. 2007, 8: 108-10.1186/1471-2164-8-108.PubMed CentralView ArticlePubMedGoogle Scholar
- Oleksiak MF: Changes in gene expression due to chronic exposure to environmental pollutants. Aquatic Toxicology. 2008, 90: 161-171. 10.1016/j.aquatox.2008.08.010.PubMed CentralView ArticlePubMedGoogle Scholar
- Paschall JE, Oleksiak MF, VanWye JD, Roach JL, Whitehead JA, Wyckoff GJ, Kolell KJ, Crawford DL: FunnyBase: a systems level functional annotation of Fundulus ESTs for the analysis of gene expression. Bmc Genomics. 2004, 5: 96-10.1186/1471-2164-5-96.PubMed CentralView ArticlePubMedGoogle Scholar
- Adams SM, Lindmeier JB, Duvernell DD: Microsatellite analysis of the phylogeography, Pleistocene history and secondary contact hypotheses for the killifish, Fundulus heteroclitus. Molecular Ecology. 2006, 15: 1109-1123. 10.1111/j.1365-294X.2006.02859.x.View ArticlePubMedGoogle Scholar
- Bernardi G, Sordino P, Powers DA: Concordant mitochondrial and nuclear-DNA phylogenies for populations of the teloest fish Fundulus heteroclitus. Proceedings of the National Academy of Sciences of the United States of America. 1993, 90: 9271-9274. 10.1073/pnas.90.20.9271.PubMed CentralView ArticlePubMedGoogle Scholar
- Crawford DL, Powers DA: Evolutionary Adaptation to Different Thermal Environments Via Transcriptional Regulation. Molecular Biology and Evolution. 1992, 9: 806-813.PubMedGoogle Scholar
- McMillan AM, Bagley MJ, Jackson SA, Nacci DE: Genetic diversity and structure of an estuarine fish (Fundulus heteroclitus) indigenous to sites associated with a highly contaminated urban harbor. Ecotoxicology. 2006, 15: 539-548. 10.1007/s10646-006-0090-4.View ArticlePubMedGoogle Scholar
- Williams LM, Oleksiak MF: Signatures of selection in natural populations adapted to chronic pollution. Bmc Evolutionary Biology. 2008, 8: 282-10.1186/1471-2148-8-282.PubMed CentralView ArticlePubMedGoogle Scholar
- Whitehead A: Comparative mitochondrial genomics within and among species of killifish. Bmc Evolutionary Biology. 2009, 9: 11-10.1186/1471-2148-9-11.PubMed CentralView ArticlePubMedGoogle Scholar
- Powers DA, Place AR: Biochemical genetics of Fundulus heteroclitus (L). Temporal and spatial variation in gene-frequencies of LDH-B, MDH-A, GPI-B, and PGM-A. Biochemical Genetics. 1978, 16: 593-607. 10.1007/BF00484222.View ArticlePubMedGoogle Scholar
- Powers DA, Ropson I, Brown DC, Vanbeneden R, Cashon R, Gonzalezvillasenor LI, Dimichele JA: Genetic variation in Fundulus heteroclitus - geographic distribution. American Zoologist. 1986, 26: 131-144.View ArticleGoogle Scholar
- Cashon RE, Vanbeneden RJ, Powers DA: Biochemical genetics of Fundulus heteroclitus (L). Spatial variation in gene-frequencies of IDH-A, IDH-B, 6-PGDH-A, and EST-S. Biochemical Genetics. 1981, 19: 715-728. 10.1007/BF00484004.View ArticlePubMedGoogle Scholar
- Crawford DL, Powers DA: Molecular-Basis of Evolutionary Adaptation at the Lactate Dehydrogenase-B Locus in the Fish Fundulus-Heteroclitus. Proceedings of the National Academy of Sciences of the United States of America. 1989, 86: 9365-9369. 10.1073/pnas.86.23.9365.PubMed CentralView ArticlePubMedGoogle Scholar
- Crawford DL, Constantino HR, Powers DA: Lactate Dehydrogenase-B Cdna from the Teleost Fundulus-Heteroclitus - Evolutionary Implications. Molecular Biology and Evolution. 1989, 6: 369-383.PubMedGoogle Scholar
- Gill P: An assessment of the utility of single nucleotide polymorphisms (SNPs) for forensic purposes. International Journal of Legal Medicine. 2001, 114: 204-210. 10.1007/s004149900117.View ArticlePubMedGoogle Scholar
- Weir BS, Cardon LR, Anderson AD, Nielsen DM, Hill WG: Measures of human population structure show heterogeneity among genomic regions. Genome Research. 2005, 15: 1468-1476. 10.1101/gr.4398405.PubMed CentralView ArticlePubMedGoogle Scholar
- Paschou P, Ziv E, Burchard EG, Choudhry S, Rodriguez-Cintron W, Mahoney MW, Drineas P: PCA-correlated SNPs for structure identification in worldwide human populations. Plos Genetics. 2007, 3: 1672-1686. 10.1371/journal.pgen.0030160.View ArticlePubMedGoogle Scholar
- Yamaguchi-Kabata Y, Nakazono K, Takahashi A, Saito S, Hosono N, Kubo M, Nakamura Y, Kamatani N: Japanese Population Structure, Based on SNP Genotypes from 7003 Individuals Compared to Other Ethnic Groups: Effects on Population- Based Association Studies. American Journal of Human Genetics. 2008, 83: 445-456. 10.1016/j.ajhg.2008.08.019.PubMed CentralView ArticlePubMedGoogle Scholar
- Primmer CR, Borge T, Lindell J, Saetre GP: Single-nucleotide polymorphism characterization in species with limited available sequence information: high nucleotide diversity revealed in the avian genome. Molecular Ecology. 2002, 11: 603-612. 10.1046/j.0962-1083.2001.01452.x.View ArticlePubMedGoogle Scholar
- Kong FR, Tong ZS, Chen XY, Sorrell T, Wang B, Wu QX, Ellis D, Chen S: Rapid identification and differentiation of Ttichophyton species, based on sequence Polymorphisms of the ribosomal internal transcribed spleacer regions, by rollingcircle amplification. Journal of Clinical Microbiology. 2008, 46: 1192-1199. 10.1128/JCM.02235-07.PubMed CentralView ArticlePubMedGoogle Scholar
- Johnson N, Fletcher O, Palles C, Rudd M, Webb E, Sellick G, Silva IDS, McCormack V, Gibson L, Fraser A, et al: Counting potentially functional variants in BRCA1, BRCA2 and ATM predicts breast cancer susceptibility. Human Molecular Genetics. 2007, 16: 1051-1057. 10.1093/hmg/ddm050.View ArticlePubMedGoogle Scholar
- Tomlinson I, Webb E, Carvajal-Carmona L, Broderick P, Kemp Z, Spain S, Penegar S, Chandler I, Gorman M, Wood W, et al: A genome-wide association scan of tag SNPs identifies a susceptibility variant for colorectal cancer at 8q24.21. Nature Genetics. 2007, 39: 984-988. 10.1038/ng2085.View ArticlePubMedGoogle Scholar
- Amos CI, Wu XF, Broderick P, Gorlov IP, Gu J, Eisen T, Dong Q, Zhang Q, Gu XJ, Vijayakrishnan J, et al: Genome-wide association scan of tag SNPs identifies a susceptibility locus for lung cancer at 15q25.1. Nature Genetics. 2008, 40: 616-622. 10.1038/ng.109.PubMed CentralView ArticlePubMedGoogle Scholar
- Poehlmann A, Kuester D, Meyer F, Lippert H, Roessner A, Schneider-Stock R: K-ras mutation detection in colorectal cancer using the Pyrosequencing technique. Pathology Research and Practice. 2007, 203: 489-497. 10.1016/j.prp.2007.06.001.View ArticleGoogle Scholar
- Mauricio R, Stahl EA, Korves T, Tian DC, Kreitman M, Bergelson J: Natural selection for polymorphism in the disease resistance gene Rps2 of Arabidopsis thaliana. Genetics. 2003, 163: 735-746.PubMed CentralPubMedGoogle Scholar
- Hoekstra HE, Hirschmann RJ, Bundey RA, Insel PA, Crossland JP: A single amino acid mutation contributes to adaptive beach mouse color pattern. Science. 2006, 313: 101-104. 10.1126/science.1126121.View ArticlePubMedGoogle Scholar
- Moen T, Hayes B, Nilsen F, Delghandi M, Fjalestad KT, Fevolden SE, Berg PR, Lien S: Identification and characterisation of novel SNP markers in Atlantic cod: Evidence for directional selection. Bmc Genetics. 2008, 9: 18-10.1186/1471-2156-9-18.PubMed CentralView ArticlePubMedGoogle Scholar
- Namroud MC, Beaulieu J, Juge N, Laroche J, Bousquet J: Scanning the genome for gene single nucleotide polymorphisms involved in adaptive population differentiation in white spruce. Molecular Ecology. 2008, 17: 3599-3613. 10.1111/j.1365-294X.2008.03840.x.PubMed CentralView ArticlePubMedGoogle Scholar
- Vignal A, Milan D, SanCristobal M, Eggen A: A review on SNP and other types of molecular markers and their use in animal genetics. Genetics Selection Evolution. 2002, 34: 275-305. 10.1186/1297-9686-34-3-275.View ArticleGoogle Scholar
- Wirgin II, Damore M, Grunwald C, Goldman A, Garte SJ: Genetic Diversity at an Oncogene Locus and in Mitochondrial-DNA between Populations of Cancer-Prone Atlantic Tomcod. Biochemical Genetics. 1990, 28: 459-475. 10.1007/BF00554374.View ArticlePubMedGoogle Scholar
- Aljanabi SM, Martinez I: Universal and rapid salt-extraction of high quality genomic DNA for PCR-based techniques. Nucleic Acids Research. 1997, 25: 4692-4693. 10.1093/nar/25.22.4692.PubMed CentralView ArticlePubMedGoogle Scholar
- Parameswaran P, Jalili R, Tao L, Shokralla S, Gharizadeh B, Ronaghi M, Fire AZ: A pyrosequencing-tailored nucleotide barcode design unveils opportunities for largescale sample multiplexing. Nucleic Acids Research. 2007, 35: e130-10.1093/nar/gkm760.PubMed CentralView ArticlePubMedGoogle Scholar
- Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen ZT, et al: Genome sequencing in microfabricated highdensity picolitre reactors. Nature. 2005, 437: 376-380.PubMed CentralPubMedGoogle Scholar
- Meyer M, Stenzel U, Hofreiter M: Parallel tagged sequencing on the 454 platform. Nature Protocols. 2008, 3: 267-278. 10.1038/nprot.2007.520.View ArticlePubMedGoogle Scholar
- Huse SM, Huber JA, Morrison HG, Sogin ML, Welch DM: Accuracy and quality of massively parallel DNA pyrosequencing. Genome Biology. 2007, 8: R143-10.1186/gb-2007-8-7-r143.PubMed CentralView ArticlePubMedGoogle Scholar
- Brockman W, Alvarez P, Young S, Garber M, Giannoukos G, Lee WL, Russ C, Lander ES, Nusbaum C, Jaffe DB: Quality scores and SNP detection in sequencing-by-synthesis systems. Genome Research. 2008, 18: 763-770. 10.1101/gr.070227.107.PubMed CentralView ArticlePubMedGoogle Scholar
- Huang XQ, Madan A: CAP3: A DNA sequence assembly program. Genome Research. 1999, 9: 868-877. 10.1101/gr.9.9.868.PubMed CentralView ArticlePubMedGoogle Scholar
- Batzoglou S, Jaffe DB, Stanley K, Butler J, Gnerre S, Mauceli E, Berger B, Mesirov JP, Lander ES: ARACHNE: A whole-genome shotgun assembler. Genome Research. 2002, 12: 177-189. 10.1101/gr.208902.PubMed CentralView ArticlePubMedGoogle Scholar
- Duvernell DD, Lindmeier JB, Faust KE, Whitehead A: Relative influences of historical and contemporary forces shaping the distribution of genetic variation in the Atlantic killifish, Fundulus heteroclitus. Molecular Ecology. 2008, 17: 1344-1360. 10.1111/j.1365-294X.2007.03648.x.View ArticlePubMedGoogle Scholar
- Quackenbush J, Liang F, Holt I, Pertea G, Upton J: The TIGR Gene Indices: reconstruction and representation of expressed gene sequences. Nucleic Acids Research. 2000, 28: 141-145. 10.1093/nar/28.1.141.PubMed CentralView ArticlePubMedGoogle Scholar
- Hahn ME, Karchner SI, Franks DG, Merson RR: Aryl hydrocarbon receptor polymorphisms and dioxin resistance in Atlantic killifish (Fundulus heteroclitus). Pharmacogenetics. 2004, 14: 131-143. 10.1097/00008571-200402000-00007.View ArticlePubMedGoogle Scholar
- Excoffier L, Laval G, Schneider S: Arlequin (version 3.0): An integrated software package for population genetics data analysis. Evolutionary Bioinformatics. 2005, 47-50.Google Scholar
- Pritchard JK, Stephens M, Donnelly P: Inference of population structure using multilocus genotype data. Genetics. 2000, 155: 945-959.PubMed CentralPubMedGoogle Scholar
- Falush D, Stephens M, Pritchard JK: Inference of population structure using multilocus genotype data: Linked loci and correlated allele frequencies. Genetics. 2003, 164: 1567-1587.PubMed CentralPubMedGoogle Scholar
- Rosenberg NA: DISTRUCT: a program for the graphical display of population structure. Molecular Ecology Notes. 2004, 4: 137-138. 10.1046/j.1471-8286.2003.00566.x.View ArticleGoogle Scholar
- Wiedmann RT, Smith TPL, Nonneman DJ: SNP discovery in swine by reduced representation and high throughput pyrosequencing. Bmc Genetics. 2008, 9: 81-10.1186/1471-2156-9-81.PubMed CentralView ArticlePubMedGoogle Scholar
- Shen R, Fan JB, Campbell D, Chang WH, Chen J, Doucet D, Yeakley J, Bibikova M, Garcia EW, McBride C, et al: High-throughput SNP genotyping on universal bead arrays. Mutation Research-Fundamental and Molecular Mechanisms of Mutagenesis. 2005, 573: 70-82. 10.1016/j.mrfmmm.2004.07.022.View ArticlePubMedGoogle Scholar
- Fan J-B, Oliphant A, Shen R, Kermani BG, Garcia F, Gunderson KL, Hansen M, Steemers F, Butler SL, Deloukas P, et al: Highly Parallel SNP Genotyping. Cold Spring Harbor Symposia on Quantitative Biology. 2003, 68: 69-78. 10.1101/sqb.2003.68.69.View ArticlePubMedGoogle Scholar
- Mickelson DMCL, Fullerton DS, Borns HW, (Ed): Late-Quaternary Environments of the United States. 1983, Minneapolis: University of Minnesota PressGoogle Scholar
- Able KW, Felley JD: Geographical variation in Fundulus heteroclitus - Tests for concordance between egg and adult morphologies. American Zoologist. 1986, 26: 145-157.View ArticleGoogle Scholar
- Ropson IJ, Brown DC, Powers DA: Biochemical genetics of Fundulus heteroclitus (L.) 6. Geographical variation in the gene frequencies of 15 loci. National Geographic Research. 1990, 44: 16-26.Google Scholar
- Bernatchez L, Wilson CC: Comparative phylogeography of nearctic and palearctic fishes. Molecular Ecology. 1998, 7: 431-452. 10.1046/j.1365-294x.1998.00319.x.View ArticleGoogle Scholar
- Gysels ES, Hellemans B, Pampoulie C, Volckaert FAM: Phylogeography of the common goby, Pomatoschistus microps, with particular emphasis on the colonization of the Mediterranean and the North Sea. Molecular Ecology. 2004, 13: 403-417. 10.1046/j.1365-294X.2003.02087.x.View ArticlePubMedGoogle Scholar
- McMillen-Jackson AL, Bert TM: Mitochondrial DNA variation and population genetic structure of the blue crab Callinectes sapidus in the eastern United States. Marine Biology. 2004, 145: 769-777.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.