Quantitative high resolution melting: two methods to determine SNP allele frequencies from pooled samples

Capper, Roxana L.; Jin, Young K.; Lundgren, Petra B.; Peplow, Lesa M.; Matz, Mikhail V.; van Oppen, Madeleine J. H.

doi:10.1186/s12863-015-0222-z

Methodology article
Open access
Published: 13 June 2015

Quantitative high resolution melting: two methods to determine SNP allele frequencies from pooled samples

Roxana L. Capper¹,
Young K. Jin^2,3,
Petra B. Lundgren⁴,
Lesa M. Peplow³,
Mikhail V. Matz¹ &
…
Madeleine J. H. van Oppen³

BMC Genetics volume 16, Article number: 62 (2015) Cite this article

5609 Accesses
11 Citations
11 Altmetric
Metrics details

Abstract

Background

The advent of next-generation sequencing has brought about an explosion of single nucleotide polymorphism (SNP) data in non-model organisms; however, profiling these SNPs across multiple natural populations still requires substantial time and resources.

Results

Here, we introduce two cost-efficient quantitative High Resolution Melting (qHRM) methods for measuring allele frequencies at known SNP loci in pooled DNA samples: the “peaks” method, which can be applied to large numbers of SNPs, and the “curves” method, which is more labor intensive but also slightly more accurate. Using the reef-building coral Acropora millepora, we show that both qHRM methods can recover the allele proportions from mixtures prepared using two or more individuals of known genotype. We further demonstrate advantages of each method over previously published methods; specifically, the “peaks” method can be rapidly scaled to screen several hundred SNPs at once, whereas the “curves” method is better suited for smaller numbers of SNPs.

Conclusions

Compared to genotyping individual samples, these methods can save considerable effort and genotyping costs when relatively few candidate SNPs must be profiled across a large number of populations. One of the main applications of this method could be validation of SNPs of interest identified in population genomic studies.

Background

Population genomics seeks to link genome-wide genetic variation to evolutionary processes. By querying a large number of individuals for multiple single nucleotide polymorphisms (SNPs) distributed across the genome, polymorphisms with unexpected patterns of genetic differentiation can be identified. These SNPs may be linked to loci under divergent selection between disparate environments [1]. Recent advances in high-throughput sequencing have made it easy to identify large numbers of SNPs in a limited number of individuals [2]. However, profiling a smaller number of SNPs across many populations, for example to study allele frequencies at specific candidate SNP loci, remains time consuming and expensive if based on individual genotyping.

There are many conceptual approaches to determine allele frequencies from pooled samples, including (i) direct sequencing [3, 4], (ii) primer extension using pyrosequencing [5, 6], SNaPShot [7] or MALDI-TOF mass spectrometry [8], and (iii) preferential hybridization or amplification using allele-specific primers, such as TaqMan [9], qPCR [10], the Illumina GoldenGate platform [11], or hybridizing microarrays [12]. For a more extensive review of quantitative genotyping technologies, see Sham et al. and Garvin et al. [13, 14]. Many of these technologies show high correlations between the estimated genotype frequency from the pooled sample and the true allele frequency assessed from genotyping of individual samples, but most require considerable investment of funds and effort for each new SNP assay. While direct sequencing with next-generation sequencing methods is an increasingly popular option, the sequencing of whole genomes, reduced representation libraries (RRL), or even specific amplicons incurs unnecessary costs for research that seeks to investigate the frequencies of a subset of preselected candidate or otherwise potentially informative (“tag” or “proxy”) SNPs across populations.

Conventional HRM has been established as a very sensitive tool for detecting even low levels of a target allele [15]. Quantitative HRM methods have been developed in a range of applications [16], including detection of the adulteration of food or drug products [17–20], methylation status [21–23] and species composition in samples over time [24]. One commonality of these approaches is that they make use of relatively large differences among variants, such as multiple methylated sites within an amplicon, multiple SNP differences among species in the locus of interest or even entirely different genes representing different species. These methods are not sufficiently sensitive to detect frequencies of single SNPs among populations of the same species.

The conventional, unlabeled probe version of high resolution melting (HRM) method of SNP genotyping uses a small oligonucleotide probe to increase the differences in melting temperature between single SNP alleles. It makes use of a dye that fluoresces only when intercalated into duplex DNA structures. First, the SNP-bearing locus of interest is amplified with asymmetric PCR such that an excess of one strand is produced. Next, the short probe that overlays the SNP site itself is added to the reaction. The reaction is subsequently heated, cooled and heated again while the loss of fluorescent signal (i.e. duplex DNA disassociation) is monitored. The melting temperature of the probe-SNP duplex region is dependent on the SNP allele state; a perfect match will have a higher melting temperature than will a mismatch [25]. A conventional HRM assay typically seeks to determine whether the probe melting profile contains only the higher-temperature component, the lower-temperature component, or both.

In this study, we explore whether the unlabeled probe HRM melting profile can provide quantitative information about frequencies of alternative alleles in a pooled DNA sample. We developed two versions of qHRM that use different dye chemistries, brands of HRM machines and analytical techniques to quantify the relative proportions of each allele. We performed a total of six experiments involving adult Acropora millepora corals, an emerging ecological genomics model, to demonstrate the new method’s accuracy.

Methods

Coral collection

Adult corals were used in all experiments. For the allele titration and SNP panel validation experiments (peaks method), corals were collected from Trunk Reef (−18.368 S, 146.818 E) and Little Pioneer Bay, Orpheus Island (−18.604 S, 146.486 E), Great Barrier Reef, Queensland, Australia in 2009. Samples used in the Orpheus/Magnetic population comparison experiment (peaks method) were collected from Orpheus Island and Nelly Bay, Magnetic Island (−19.165 S, 146.851 E). Coral samples used to validate the curves method experiments were collected from 28 coral populations, summarized in Additional file 1. All corals were collected under the appropriate Great Barrier Reef Marine Park Authority permits.

DNA extraction

For the coral individuals used in the allele titration and SNP panel validation experiments (peaks method), genomic DNA was extracted using the Qiagen DNeasy Blood and Tissue kit (Qiagen) and was eluted into Buffer EB. Individuals sampled from Orpheus and Magnetic Islands (peaks method) and from the 28 populations across the Great Barrier Reef (curves method) were extracted according to the method described in Wilson et al. [26]. DNA concentrations were initially determined using the Nanodrop ND-1000 Spectrophotometer (Thermo Scientific).

Peaks method

The peaks method was developed as a rapid means to quantify SNP allele frequencies across several hundred SNP loci. Briefly, we first established proof of concept using titrated allele frequencies and two SNP markers, then validated a panel of 384 SNPs using six corals, genotyped individually and as a pool. This panel of SNPs was finally applied to samples of two natural coral populations to detect loci with different frequencies between the reefs. The workflow of the peaks method is illustrated in Fig. 1.

Pooling

In order to convert conventional HRM into a quantitative assay for pooled samples, equal amounts of each individual’s DNA must be added to the initial asymmetric PCR reaction. This DNA normalization step is particularly critical for pooling organisms that may have known or unknown assemblages, infections or symbioses as the relative amounts of target and contaminating DNA can vary among samples. To circumvent these problems, we used quantitative PCR (qPCR) to accurately measure the quantity of coral DNA in individual samples. Primers to amplify the SNP locus C23209S177 [27] were used in a conventional qPCR reaction performed under the following conditions using a LightCycler 480 machine (Roche): 5 ng holobiont DNA, 0.1 μM each forward and reverse primers, 2 mM MgCl₂ and 1x High Resolution Master Mix (Roche) in a 15 μl volume. Reactions were heated to 95 °C for 10 minutes, then cycled 55 times as follows: 40 s at 95 °C, 40 s at 60 °C, 40 s at 72 °C, then cooled to 40 °C and held for 20s. Concentrations of target DNA were normalized based on differences in quantification cycle (Cq) by assuming that the fold-difference per each Cq unit difference was equal to 1/E, where E is the amplification factor per PCR cycle for the particular primer set [28]. The accuracy of this approach was verified by re-amplifying the adjusted DNA concentrations again with the same primer pair.

Internal melting calibrators

Though within-plate variation among replicates was low for the peaks method, we noticed that between-plate variation could shift the melting profile of all duplex DNA species (i.e. for an analyzed heterozygote, such species include two perfect-match amplicons, two mismatched amplicons, one perfect-match probe-SNP duplex, and one mismatch probe-SNP duplex; Fig. 2a,b) by as much as 2 °C. To accommodate this phenomenon, which occurred only in the peaks method, we included standard internal melting calibrator primers with each reaction in order to provide landmarks in the melting profile. Four oligonucleotide calibrators were designed to form two complementary DNA duplexes with known melting temperatures. They were based on sequences from Gundry et al. [29] but were adjusted to melt at the slightly lower and higher temperatures of 47.5 °C and 90 °C than the published sequences in order to melt well outside of the range of the probe and amplicon duplexes. The 3′ terminal ends of each of the four oligos were modified with an inverted T to block any potential primer extension from occurring. The forward sequence of the low-melting calibrator used is 5′- ATT TTA TAT TTA TAT ATT TAT ATA TTT TT/3InvdT/ -3′, while the forward sequence of the high-melting calibrator is 5′- GCG CGG CCG GCA CTG ACC CGA GAC TCT GAG CGG CTG CTG GAG GTG CGG AAG CGG AGG GGC GGG/3InvdT/. The calibrator oligos were included in the initial asymmetric PCR reaction at a concentration of 0.05 μM for each of the high-melting oligos and 0.5 μM for the low-melting oligos; addition of the calibrators did not interfere with amplification of the target products.

Asymmetric template amplification and melt stage

In HRM, the asymmetric PCR stage serves to generate an excess of the amplicon strand that is complementary to the probe. This excess strand can then form a duplex with the unlabeled probe, whereas if it was not in excess it would preferentially bind to the longer, complementary amplicon strand instead. Though some versions of HRM use symmetric PCR, their applications typically involve whole-amplicon melting analysis. In contrast, unlabeled probe HRM, which yields the increased sensitivity necessary for SNP discrimination, requires asymmetric PCR in order to outcompete the amplicon. The asymmetric PCR reaction prior to each assay’s melt stage was performed as follows on the LightCycler 480 (Roche): 5 ng total pooled DNA, 0.3 μM reverse primer, 0.067 μM forward primer, 0.05 μM each high-melting calibrator forward and reverse oligos, 0.5 μM each low-melting calibrator forward and reverse oligos, 2 mM MgCl₂, and 1x High Resolution Master Mix (Roche). Reactions were heated to 95 °C and held for 10 m, then cycled as follows: 40 s at 95 °C, 40 s at 60 °C, 40 s at 72 °C for 55 cycles, which, for most SNPs, was at least 10 cycles past the beginning of the reaction plateau. The reaction was then finally cooled to 40 °C and held for 20 s.

Following the asymmetric target amplification, the reaction plates were unsealed and 0.5 μM of the oligo probe complementary to the excess reverse strand of the amplicon was added. The plates were resealed and then heated to 95 °C for one minute to melt all double-stranded DNA species apart. Subsequently, the plates were rapidly cooled to 45 °C and held for one minute to allow reannealing of all possible probe and amplicon duplexes. Next, the reactions were heated to 95 °C at the maximum rate allowed by the Roche LightCycler 480 (>0.02 ° C/s). During this heating period, the LightCycler 480 monitored the amount of fluorescence over time and collected over 1300 data points over a 50 °C range, producing a high-resolution graph of the melting profile of each species of duplex DNA in the reaction.

Analysis of denaturation profiles

We calculated the negative first derivative of the melt stage’s decreasing fluorescence curve to produce a profile with discrete peaks centered at the melting temperature of each dissociating duplex (Fig. 2c,d). Next, we drew a baseline connecting the linear regions bounding the melting peaks to determine the background rate of dye disassociation. Then, the height of the probe melting peaks above the baseline was measured using a custom [R] script (see Additional file 2). For pooled reactions, three technical replicates were performed for each locus and sample pair and their relative peak heights averaged. The genotype of a sample was determined by the presence and position of the probe peaks, relative to other samples for the same locus and to the calibrator melting peaks. The frequency of the low-melting allele in a pooled sample was estimated by the contribution of its corresponding melting peak to the total summed height of probe-related melting peaks. Across all SNP assays, a minor allele was called only if it exceeded a frequency of 2 %.

DNA titrations

To demonstrate a linear relationship between known frequencies of a given allele and the qHRM-estimated frequency estimated from a pooled sample, we first identified two individuals homozygous for different alleles of the same SNP. We performed this experiment for two SNP loci mined from the A. millepora transcriptome, C22162S248 and C45133S676 [27]. We normalized the concentrations of DNA with qPCR as described above, then mixed the two homozygotes in different proportions to obtain mixtures that incremented the frequency of the low-melting allele from 0–100 % in 5 % or 10 % steps (Additional file 3). qHRM was then performed on each mixture with five replicates for each mixture for two independent normalization and mixing trials, for both SNPs assayed. The estimated allele frequencies for all replicates of each mixture were then averaged and compared to the known allele frequencies.

Analysis of 384 loci in six individuals

DNA samples from three adult coral individuals sampled from Trunk Reef and three from Orpheus Island were genotyped individually by conventional HRM for 384 loci selected from SNPs previously developed for a previously published coral linkage mapping project [27] (Additional file 4). Following this, the samples were normalized for coral DNA concentration as explained above and pooled to generate a DNA sample with known allele frequencies for each SNP locus. qHRM was performed in triplicate on the pooled sample for the same 384 SNPs. To fully validate qHRM of each SNP in this panel, we only analyzed data from SNPs for which all six individuals were both successfully amplified and had easily distinguishable peaks when analyzed using conventional HRM, and which had clear qHRM peaks greater than 2 % of the total fluorescence signal in the reaction (usually relative to the amplicon peak’s height). At least two successful replicates were required for each SNP in the qHRM reactions for analysis to proceed.

Analysis of 384 loci in 98 individuals collected from two locations

We used the peaks method to compare allele frequencies in two populations of corals collected from Magnetic Island (n = 48) and Orpheus Island (n = 50). DNA was normalized, pooled according to reef of origin, and qHRM-genotyped in triplicate using a panel of 384 SNPs. Nineteen of these SNPs were further validated by individual HRM genotyping of each individual contributing to the pools. The set of SNPs to validate was selected to represent loci where the low-melting allele was major (8 SNPs) or minor (11 SNPs), loci with small or large minor allele frequencies (11 SNPs with MAF < 0.25; 8 SNPs with MAF > 0.25), SNPs with more than two alleles (7 SNPs with third alleles in one or both populations), and loci selected to span the range of estimated F_ST values (7 SNPs with F_ST less than 0.01; 4 SNPs with F_ST between 0.01 and 0.02; 8 SNPs with F_ST greater than 0.02, including four of the top seven SNPs with the greatest F_ST).

Analysis of genetic subdivision between coral populations

SNP allele frequencies were compared between Orpheus and Magnetic populations by plotting the averaged qHRM-estimated frequencies against each reef.

F_ST for each SNP was calculated according to Equ. 1, where p is the frequency of the high-melting allele, q is equal to (1-p), $ \overline{p} $ is the average frequency of p allele between the two populations, and $ \overline{q} $ is equal to $ \left(1-\overline{p}\right) $.

$$ {\mathrm{F}}_{\mathrm{ST}}={\frac{\left({p}_1-{p}_2\right)}{4\overline{p}\overline{q}}}^2 $$

(1)

Global F_ST was calculated by averaging all SNP F_ST values.

Curves method

The curves method was developed to extend qHRM and demonstrate functionality across another dye chemistry and analytical approach. This method is best suited for highly accurate determination of allele proportions across small numbers of SNPs. It requires the usage of three reference sequences (two homozygous samples and a heterozygous sample) for comparison with an unknown, pooled sample. We demonstrate that reactions are extremely reproducible and robust among different markers and pooled samples. The workflow of the curves method is illustrated in Fig. 2. Briefly, the curves method was validated through an initial proof of concept experiment using different known proportions of alleles for a single SNP marker. This marker was additionally validated on three DNA pools made from 43–56 individuals representing distinct coral populations by comparing the true allelic frequency of the pool (revealed by summing the genotypes of each individual) to the qHRM-estimated frequency. Thirdly, reproducibility of this method was investigated by using five SNP markers for pooled samples representing 25 coral populations.

Pooling

Each individual sample was amplified individually in an asymmetric qPCR reaction using primers for the locus C70S236 [27] under the following conditions using a Rotor-Gene Q machine (Qiagen): 0.1 μM forward primer, 1 μM reverse primer, 10 ng holobiont DNA, 1× 7.5 μl of the Accumelt HRM supermix (Quanta Biosciences) in a 15 μl volume. The reaction mixture was heated at 95 °C for 10 min, then cycled 40 times with the following thermoprofile: 95 °C for 40 s, 58 °C for 40 s, and 72 °C for 40 s. DNA concentrations were normalized according to Pfaffl [28] as summarized in the Peaks - pooling section.

Asymmetric template amplification and melt stage

After normalization of individual samples, each individual was combined into a single pooled sample. The pooled sample was then was amplified asymmetrically with the Rotor-Gene Q machine (Qiagen) with the locus C70S236. Next, 1 μM of probe complementary to the excess strand was added to each reaction. Then, the reactions were heated to 95 °C and held for 60 s, rapidly cooled to 45 °C and held for 150 s, heated again to 95 °C at a rate of 0.1 °C/s with a 2 s hold each step, collecting 500 data points in total.

Analysis of denaturation profiles

Because many factors such as pipetting errors causing variations in quantity of fluorescent dye and DNA can affect the relative fluorescence levels among normalized DNA samples, it is important to standardize the fluorescence levels to exclude the noise for accurate allele frequency measurements. Fluorescence values of the SNP-specific melt curves were standardized by selecting regions before and after the probe melting phase where nucleotide differences do not lead to variations in fluorescence level, then averaging fluorescence levels of the selected regions among all samples using the Rotor-Gene Q Series Software 2.0.2.4 (Corbett) in order to mitigate the effects of sample- and SNP-specific variation (Fig. 2e). The exclusion of noise through this normalization process allows the direct comparison of melting curves from different reactions.

The inclusion of three standard reference samples of known genotype (two different homozygotes and a heterozygote) allows for even small differences in allele frequency to be resolved. These standards set a reference fluorescence level to which unknown samples can be compared for absolute allele frequency quantification (Fig. 2f) and also serve to reduce variations between runs.

The y-axis point at which the greatest difference in fluorescence between the two homozygotes occurs is used to determine the relative proportion of alleles in an unknown sample as it gives the best highest accuracy in estimating allele frequencies (Additional file 5). This point should also be near where the inflection point of the heterozygote’s curve falls. But, because the heterozygote can suffer from amplification biases, the position of the inflection point can vary. We see increased gains in accuracy by calculating the allele frequency of the unknown sample with respect to the heterozygote rather than simply using the midpoint of the two homozygotes. For example, if an unknown sample has a higher frequency of the high-melting allele (i.e. the unknown’s curve falls above of the heterozygote’s curve and/or the point of the greatest distance between homozygotes), then Equ. 2 can be used to estimate of the proportion of the high-melting allele. If the unknown has a higher frequency of the low-melting allele (i.e. the unknown’s curve falls below the heterozygote’s curve and/or the point of the greatest distance between homozygotes), then Equ. 3 should be used to calculate the frequency of the high-melting allele. In both equations, we calculate the proportion of fluorescence at the inflection point of the unknown sample (x) relative to the heterozygote (f _het ). After adjusting for empirical differences between the known heterozygote and one of the known homozygotes (f _high or f _low), we add or subtract the value from 0.5 to determine the allele frequency of the unknown sample relative to the low-melting homozygote.

$$ \mathrm{frequency}\kern0.5em \mathrm{of}\kern0.5em \mathrm{high}\hbox{-} \mathrm{melting}\kern0.5em \mathrm{allele}=0.5+0.5\left(\frac{x-{f}_{het}}{f_{high}-{f}_{het}}\right) $$

(2)

$$ \mathrm{frequency}\kern0.5em \mathrm{of}\kern0.5em \mathrm{high}\hbox{-} \mathrm{melting}\kern0.5em \mathrm{allele}=0.5-0.5\left(\frac{f_{het}-x}{f_{het}-{f}_{low}}\right) $$

(3)

DNA titrations and method validation

The use of three reference genotypes per SNP assay and the fluorescence standardization step should theoretically allow for great sensitivity to detect allele frequency differences among pooled samples. To assess this, we first identified multiple individuals of each of the three possible genotypes for the locus C70S236 and normalized DNA concentrations of each sample. We then combined them into one pool per genotype and mixed them in varying proportions to generate a gradient of alleles spanning 0–100 % of the low-melting allele by 5–10 % increments. The estimated allele frequencies for two technical replicates of each mixture were averaged and compared to the expected allele frequencies (Additional file 3).

Application to large pools of DNA

DNA sampled from individual corals sourced from Myrmidon (n = 43), Night (n = 56) and Wilkie (n = 49) Reefs was normalized and individually genotyped using SNP C70S236. Normalized samples were then pooled by population and quantitatively genotyped in duplicate in order to examine the accuracy of the curves method when used with many individuals per pool.

qHRM reproducibility

Next, we evaluated the reproducibility of qHRM on 25 populations sampled along the Great Barrier Reef. We pooled DNA from 35–56 individuals for each population, and then performed curves qHRM on each pool in duplicate for five SNPs (C29226S281, C70S236, C11461S560, C16774S791 and C20479S292), which were all individually normalized as per above.

Consistency between the two replicates was measured by estimating the half-confidence interval of the difference as follows:

$$ \frac{z\times SD}{mean} $$

Standard deviations (SD) and mean values were calculated using fluorescence levels in the temperature range where the largest difference in fluorescence levels between the two homozygotes were observed (Fig. 2f). A z value of 1.96 was used so the measure of precision represents half of a 95 % confidence interval.

We further examined the qHRM reproducibility for the 5 markers and 25 reefs by comparing mean errors. Errors in percentage were calculated as follows:

$$ \mathrm{Percentage}\kern0.5em \mathrm{of}\kern0.5em \mathrm{error}\kern0.5em \mathrm{between}\kern0.5em \mathrm{replicates}=\left(\frac{f_1-{f}_2}{\left({f}_1+{f}_2\right)/2}\right) $$

where f ₁ and f ₂ are two replicates of fluorescence data. Geometric means of errors were used as population- and marker-specific errors.