Detecting negative selection on recurrent mutations using gene genealogy
© Ezawa et al.; licensee BioMed Central Ltd. 2013
Received: 20 September 2012
Accepted: 13 April 2013
Published: 7 May 2013
Whether or not a mutant allele in a population is under selection is an important issue in population genetics, and various neutrality tests have been invented so far to detect selection. However, detection of negative selection has been notoriously difficult, partly because negatively selected alleles are usually rare in the population and have little impact on either population dynamics or the shape of the gene genealogy. Recently, through studies of genetic disorders and genome-wide analyses, many structural variations were shown to occur recurrently in the population. Such “recurrent mutations” might be revealed as deleterious by exploiting the signal of negative selection in the gene genealogy enhanced by their recurrence.
Motivated by the above idea, we devised two new test statistics. One is the total number of mutants at a recurrently mutating locus among sampled sequences, which is tested conditionally on the number of forward mutations mapped on the sequence genealogy. The other is the size of the most common class of identical-by-descent mutants in the sample, again tested conditionally on the number of forward mutations mapped on the sequence genealogy. To examine the performance of these two tests, we simulated recurrently mutated loci each flanked by sites with neutral single nucleotide polymorphisms (SNPs), with no recombination. Using neutral recurrent mutations as null models, we attempted to detect deleterious recurrent mutations. Our analyses demonstrated high powers of our new tests under constant population size, as well as their moderate power to detect selection in expanding populations. We also devised a new maximum parsimony algorithm that, given the states of the sampled sequences at a recurrently mutating locus and an incompletely resolved genealogy, enumerates mutation histories with a minimum number of mutations while partially resolving genealogical relationships when necessary.
With their considerably high powers to detect negative selection, our new neutrality tests may open new venues for dealing with the population genetics of recurrent mutations as well as help identifying some types of genetic disorders that may have escaped identification by currently existing methods.
KeywordsPopulation genetics Recurrent mutation Negative selection Deleterious mutation Neutrality test
Whether and how a mutant allele is selected is an important topic in population genetics, because it, along with the population size, demography, and the mode and tempo of mutation, crucially dictates the evolutionary fate of the mutant allele and/or the polymorphism pattern in the population (e.g., [1–4]). The type and intensity of selection also indicate the functional impact and the evolutionary history of a mutation and the locus that underwent it. A number of statistical tests have been developed to detect selection on mutant alleles (e.g., [5–12]). Most of them are based on the null-hypothesis that mutants are selectively neutral [13–17] and are called “neutrality tests.” These neutrality tests were successful to some degree in detecting balancing selection (e.g., [18–21]) and positive selection (e.g., [22–24]). Detection of negative selection, in contrast, has generally been unsuccessful, probably because of the weak signals displayed by deleterious mutants (e.g.,  and references therein).
So far, development of tools for population genetics analyses has centered around the infinite-site model , which suitably describes single-nucleotide polymorphisms (SNPs), one of the commonest and most actively studied types of polymorphisms (e.g., [27–30]). Recent technological innovations, however, enabled the detection of another type of polymorphism, namely structural variations (SVs), including copy number variations (CNVs) (e.g., [31–35]). These studies have revealed that SVs are very common in eukaryotic genomes (e.g., [36–40]) including the human genome (e.g., [41–43]).
Some of the structurally variant mutations (SV mutations) associated with genomic diseases have long been known to recur in the human population (e.g., ). A recent genome-wide analysis suggested that such “recurrent mutations” are quite common among CNVs . Recurrent mutations are also quite common among inversions, another well-known type of SV . Assessing the selective force on each of such recurrent mutations is essential for estimating its evolutionary and/or medical impacts on the genome undergoing them. Positively selected (e.g., ) and selectively neutral (e.g., ) recurrent SV mutations drive genome evolution. Negatively selected recurrent SV mutations (reviewed e.g., in ), in contrast, will not substantially contribute to genomic differences between species. New identification of such deleterious recurrent mutations, however, may reveal some disorders whose genetic causes have so far remained elusive.
In this study, we attempt to detect negative selection on recurrent mutations, such as those generating SVs, by exploiting the gene genealogy of sampled sequences. Broadly speaking, our rationale is the following. Although the signal of negative selection on a single mutation event may be too weak to be detected, the synergistic effect of the signals from multiple mutation events of a specific type might become strong enough to enable detection. Therefore, if the genealogy of sampled sequences reveals recurrent mutation events, we may be able to detect negative selection on the mutants.
To validate this idea, we conducted an extensive computer simulation analysis. In the analysis, we first simulated recurrent mutations under different conditions in a population with a constant size of 10,000 and in populations that expanded from a bottleneck, all without recombination, using a coalescent simulator, msms. Then we examined the ability of our new neutrality tests to correctly detect negative selection on recurrent mutations at each simulated locus. Our computer simulation analyses demonstrated that our new tests can correctly detect negative selection with high true-positive rates in constant-size population, and at moderate true-positive rates in expanding populations. This gives us some hope that our new neutrality tests may provide a useful means for real data scans to detect deleterious recurrent mutations, and also opens the possibility of further developing methods to address some outstanding issues, such as recombination and population substructure, that could not have been dealt with in this study.
Our new tests require an algorithm to map mutation events on a gene genealogy at the recurrently mutating locus. In this study, the genealogy is reconstructed from SNPs flanking (or residing within) the locus in question. For this purpose, we also developed a new maximum parsimony algorithm that overcomes a problem inherent in any traditional tree reconstruction algorithm coupled with any traditional parsimony-based mutation mapping algorithm, which is the tendency to overestimate the number of mutation events if the genealogy is inferred from SNPs (see Methods).
Subjects of our new neutrality tests
Before going into the details of our methods, we would like to clarify what our new neutrality tests are intended for. In principle, our new tests are aimed at detecting negative selection on any type of recurrent mutations that satisfy the following two conditions: (i) the subject mutations in a test share some features clearly distinguishable from other mutations, especially neutral SNPs; and (ii) sequences with subject mutations can be sub-classified at least approximately into classes of shared origins (i.e., classes of identical-by-descent mutants) by some means, such as the genealogy of sequences, identifying characteristics, and/or exact locations.
Our original purpose was to judge whether recurrent mutations at each structurally variant (SV) locus are deleterious or not, using the sequence genealogy reconstructed with SNPs to identify the recurrent mutation events. SV mutations often have rates θ μ (≡4Nμ) ∼ 1 (e.g., [44, 45], where N is the (effective) population size and μ is the mutation rate per haploid locus per generation). Occasionally, θ μ >10 . A second conceivable kind of subject is a set of mutations at a micro-satellite locus, which are known to occur at a very high rate, with θ μ typically ranging from 1 to 100 (see e.g., ).
A third kind of subject would be a class of mutations that satisfy two conditions: (i) they occurred in a region, such as in a haplotype block, that consists of sites reasonably linked with one another; and (ii) they exhibit suspected signs of functional loss or impairment (e.g., insertions, frame-shifting indels, nonsense point mutations, and mutations on signals of splicing or gene expression) of a putative gene, such as the one predicted by a genome-wide annotation. The new tests on this class may be useful for inferring whether or not a putative gene is functional, especially when there are no other data to ascertain its purported functionality (see also Discussion).
Although the methods described in this paper are intended for applications to simple SV mutations, other potential subject mutations, such as the ones mentioned above, could also be examined by our new neutrality tests, as long as we can define appropriate null models.
Detecting negative selection on recurrent mutations using gene genealogy (I): Theoretical rationale and test statistics
Negative selection, on the other hand, skews the mutant AFS toward low frequencies (Figure 1E), which are highly populated even under selective neutrality (Figure 1A). For example, consider the proportion of singleton mutant sites out of all polymorphic sites when n sequences are sampled. Under the selectively neutral infinite-site model  in a constant-size population, it is approximately given by [7, 57]: , which is ~19.5% when n=100, and ~10.2% even when n=10,000. Therefore, even in the extreme case in which a deleterious mutant only leaves a single offspring among as many as 10,000 sampled sequences, the signal of negative selection cannot acquire the statistical significance of less than 5%. (Of course, an individual carrying a negatively selected mutation may not have any offspring at all. We will not discuss such a case because our methods only work with observed mutant alleles.) In terms of gene genealogy, we can say that a deleterious mutant modifies the shape of the genealogy only modestly, if any (e.g., [51–55]), because such a mutant tends to occupy only the tip of the genealogy, with fewer offspring lasting for shorter times than neutral ones (Figure 1F). These characteristics have prevented individual events of deleterious mutations from being detected via population genetics methods (e.g.,  and references therein; but see ).
Even with n=100, for example, the probability is ~3.7% when M=2, and ~0.7% when M=3, enabling us to detect negative selection with a sufficient statistical significance. In actual situations, however, the mutation events may interfere with one another, deviating the actual probability from the rough estimation above, and the probability function will depend on the “mutation kinetics,” i.e., possible genetic states and the rates of mutations between the states. Besides, M will decrease as the negative selection becomes stronger and as the rate of mutation becomes smaller. Thus, it is not easily predictable how widely applicable our new tests will be. We, therefore, conducted an extensive simulation analysis to examine the actual detection powers of our new tests, as well as their applicable range in the parameter space of mutation rates and selection intensity.
Based on the above rationale, we devised two test statistics. One is the size of the most common class of identical-by-descent mutants in a sample (Max D ), which is tested conditionally on the number of forward mutations from the ancestral state to the mutant state, M, that were mapped on the genealogy. This statistic is denoted by Max D | M . The other statistic is the total number of mutants in the sample (Tot D ), again tested conditionally on M. This statistic is denoted by Tot D | M . The first statistic is somewhat reminiscent of the test statistic in Ewens’ test ; their similarities and crucial differences will be explained in Discussion. To calculate these test statistics for each subject locus, we need to know the numbers M and Max D . These are inferred by using a genealogy of the sampled sequences.
Detecting negative selection on recurrent mutations using gene genealogy (II): Overall procedure
After the input data are obtained, we first infer the genealogy of the sampled sequences using the SNPs [step (a) in Figure 3]. Second, based on the inferred genealogy, we enumerate the most parsimonious mutation scenarios that will realize the allelic states of the sampled sequences with a minimum number of mutations [step (b)]. Third, for each mutation scenario, we will calculate the two test statistics, Max D | M and Tot D | M [step (c)]. Fourth, the statistics calculated for the mutation scenarios based on selectively neutral loci will be gathered to constitute the “empirical null-distributions” of the statistics [step (d)], which will in turn be used to assign the P-values to each locus that was simulated under negative selection [step (e)]. Finally, the results of such statistical tests will be summarized to evaluate the performance of our new tests [step (f)].
In the following sections, we describe the components of the procedure in more detail.
Simulations to generate sequence sets under a constant-size population
Forward mutation rate: θ μ = 10− 1, 10− 1/2, 1 (= 100), 10+ 1/2, 10+ 1;
Backward/forward ratio: .
For each of the 4×5×1=20 combinations of n,v/μ, and σ=0 for selectively neutral models, we simulated 10,000 samples with θ μ =10-1, 5,000 samples with θ μ =10-1/2, 3,000 samples with θ μ =1, 3,000 samples with θ μ =10+1/2, and 1,000 samples with θ μ =10+1. For negatively selected models with σ<0, we only used v/μ=0, 1, 3. For each of the 4×5×3×4=240 combinations of n, θ μ , v/μ, and σ<0, we simulated 1,000 samples. It should be noted that the simulations were conducted without regard to the allelic states at the recurrently mutated locus. Thus, the simulated samples include those that could not capture recurrent mutations within the genealogy, in addition to those that could.
Inferring gene genealogies and mutation scenarios (brief description)
The genealogy among the sequences in each simulated sample was first inferred via the Neighbor-Joining (NJ) method  using the number of SNP sites with different states as a pairwise distance between two sequences. Second, we removed interior branches not supported by any SNP site (Additional file 1: Figure S1F). Third, we placed a root at the mid-point between the most distant pair of sequences. Fourth, because all existing parsimony algorithms (e.g., [60, 61]) may overestimate the number of mutations under some circumstances (Additional file 1: Figure S1G), we mapped mutation events at the recurrently mutating locus onto the resulting “SNP-supported tree” by using a new maximum parsimony algorithm that we have especially designed for this purpose. The new algorithm enumerates all possible mutation scenarios that could result in the minimum number of mutations, each accompanied by additional interior branches necessary to realize the scenario (Additional file 1: Figure S1H). The section “Inferring Gene Genealogies and Mutation Scenarios (Rationale)” of Supplementary methods in Supplementary Notes (Additional file 1) describes the rationale behind this new parsimony algorithm and our genealogy reconstruction method. Additional file 2 is dedicated entirely to a detailed description of this new parsimony algorithm.
New statistical tests to detect negative selection
where P E (scenario) is (1a) and (1b), when the test statistic is Max D | M and Tot D | M , respectively.
Performance tests under expanding population
We also examined the performance of our new statistical tests on sequence data sets simulated under a population that expanded recently. As an expanding population, we used a simple model that broadly reproduces the European demography inferred by . In terms of forward time evolution, the model population begins with an ancestral (bottleneck) population at equilibrium with the constant size N B =2100. Then the population is shrunk to NEU0=1000 at T EU-AS =21200 years ago (when it separated from the Asian population), and then it expands exponentially. For the expansion rate, r, we used the maximum-likelihood estimate for the European population, r EU =4.0×10-3 per generation and a generation time of 25 years. We also used the lower and upper bounds of the parametric bootstrap bias-corrected 95 % confidence interval, r EU =2.6×10-3 and 5.7×10-3 per generation .
Other parameters were basically the subsets of those used for the performance tests under the constant-size population. A caveat is that population genetic parameters are rescaled so that their raw values (but not their population-scaled values) match the values for the constant population of size N=10000. More specifically, we used sample sizes of n = 100 and 200, backward/forward ratios of v/μ = 0, 1, and 3, and selection coefficients equivalent to σ = 0 (neutral), − 10+ 3/2, − 10+ 2, − 10+ 5/2. As for the forward mutation rate, θ μ , we used the same exact setting as for the constant-size population.
We conducted two performance tests. First, we examined the performance of our new tests just as we did under the constant-size population, assuming that the expansion rate r=r EU was inferred exactly. Second, to examine the effect of erroneous inference of r=r EU , our new tests with the empirical null-distributions computed with r EU =4.0×10-3 were applied to the sequence sets simulated under r EU =2.6×10-3 and r EU =5.7×10-3.
Performance of our new parsimony algorithm
The new neutrality tests as described in this paper depend on a new parsimony algorithm that we developed to map mutation events on the sequence genealogy. Therefore, we first compared the new parsimony algorithm with traditional tree reconstruction algorithms, in terms of the accuracy of tree reconstruction. As a representative of the traditional tree reconstruction algorithm, we used the neighbor-joining (NJ) method . We first note that, under the current situation where a tree is reconstructed only from sites following the infinite-site model, the NJ method should infer trees as accurately as the maximum-likelihood (ML) method, which is known to be the most accurate under most situations. A problem is that most traditional tree reconstruction algorithms forcefully infer a fully resolved tree by randomly inserting (zero-length) branches to “resolve” practically multifurcated nodes. Our new parsimony algorithm solves this problem by starting with a multifurcated tree whose branches are all supported by SNP sites, and further resolving phylogenetic relationships by taking advantage of the recurrent mutations (see Additional files 1 and 2 for details). To make sure that this strategy actually works, we applied both the NJ method and our new parsimony algorithm to each sequence set simulated as detailed in the next subsection, and compared the reconstructed trees with the true genealogy among simulated sequences. When the sample size n=100 and v/μ=1, for example, each NJ tree has 73±5 false-positive branches (the numbers represent mean±standard deviation), while each tree via our new parsimony has on average 1±1 false-positive branches. Next we defined the “additional true-branch rate” as , where ATP is the number of true-positive branches not supported by SNPs, and FP is the number of false-positive branches. Under these conditions, the additional true-branch rate of our new parsimony algorithm (0.378±0.298) was more than five times higher than that obtained by the NJ method (0.071±0.035). Results were similar under other conditions (as long as the sample size was quite large). Additional file 1: Tables S2 and S3 show the results in more details.
Frequency of recurrent mutations captured by gene genealogy
Relative frequencies of recurrent mutations captured by gene genealogy, out of polymorphic loci
σ = − 10
σ = − 103/2
σ = − 102
σ = − 105/2
σ = − 10
σ = − 103/2
σ = − 102
σ = − 105/2
C. v / μ=3.
σ = − 10
σ = − 103/2
σ = − 102
σ = − 105/2
Although we also examined the simulations with n = 20, their gene genealogies rarely captured the recurrent mutations unless the forward mutation rate is extremely high (θ μ ≥ 10). Thus, we judged that our new test is useful only when the sample size is fairly large, and focused on the case of n = 100, unless otherwise stated.
Number of mutations mapped on the gene genealogy
Distributions of new test statistics under selective neutrality and negative selection
To detect negative selection on recurrent mutations, we devised two test statistics, Max D | M and Tot D | M . The statistic Max D | M is the size of the most common class of identical-by-descent mutants in the sample (at the recurrently mutating locus) inferred with a genealogy (Max D ), tested conditionally on the number of forward mutation events (M). The statistic Tot D | M is the total number of mutants in the sample (Tot D (≡m)), again tested conditionally on M. Briefly, these test statistics are expected to be smaller under negative selection than under neutrality, because the descendants of deleterious mutants are unlikely to proliferate. And, because M is fixed, the statistics are expected to be mostly immune to the problem discussed in the last section.
Performance of our new neutrality tests to detect negative selection on recurrent mutations
False positive and true positive rates via Tot D | M , when v / μ is not known in advance
σ = − 10
σ = − 103/2
σ = − 102
σ = − 105/2
σ = − 10
σ = − 103/2
σ = − 102
σ = − 105/2
σ = − 10
σ = − 103/2
σ = − 102
σ = − 105/2
With the null-distributions at hand, we examined the performance of our new tests by applying them to the samples of sequences simulated under negative selection. We chose the nominal significance level of 5%. To figure out the actual rate of false-positives (i.e., type I errors), we also applied the tests to sequence samples simulated under selective neutrality. Overall, the two test statistics performed similarly well, with Tot D | M performing slightly better than Max D | M (compare e.g., Table 2 with Additional file 1: Table S9). Thus, henceforth, we will only show the results for Tot D | M . Table 2, Additional file 1: Tables S10 and S11 show the proportions of simulated samples with size n = 100, 50, and 200, respectively, that tested positive via Tot D | M (under α = 1 and using null-distributions for unknown v/μ), out of the samples whose gene genealogies identified recurrent mutations. The proportions could be regarded as true positive rates if the simulations are under negative selection, and as false-positive rates if the simulations are under selective neutrality. Both tests demonstrate high true-positive rates of ~50-80%, while keeping the false-positive rates down to around 5% or less, for strongly negative selection (with σ = − 10+ 2, − 10+ 5/2) and with large sample sizes (n = 100 and 200) (Table 2 and Additional file 1: Table S11). Although the true positive rates somewhat dropped for moderately negative selection (with σ = − 10+ 3/2), still 10-30% of the cases were detected. On the other hand, the true positive rates for weakly deleterious mutations (with σ=-10) were marginal, hovering around 10% or less. Thus our new tests will have little power when detecting weak negative selection on recurrent mutations, no matter how frequently the mutations occur. The tests suffered low positive rates also under weak to moderate selection (with σ ≥ − 10+ 3/2) with a very high mutation rate (with θ μ = 10), probably because independent forward mutations were erroneously merged on incompletely resolved gene genealogies, which is inevitable. Or it may also be because an excessively high number of forward mutations could in principle prevent Tot D | M and Max D | M from clearly distinguishing between deleterious mutations and selectively neutral ones.
For a medium sample size (n = 50), the true-positive rate is reduced to less than 30% (Additional file 1: Table S10). This is because the null-distributions of Max D | M and Tot D | M are “inherently discrete,” namely, their smallest non-zero probabilities are slightly greater than 5% for M = 2 when n = 50.
Performance of new neutrality tests under expanding populations
False positive and true positive rates via Tot D | M , when v / μ is not known in advance, under expanding population (with correct r )
A. r=2.6 × 10-3
σ = − 103/2
σ = − 102
σ = − 105/2
B. r=4.0 × 10-3
σ = 0 (neutral)
σ = − 103/2
σ = − 102
σ = − 105/2
C. r=5.7 × 10-3
σ = − 103/2
σ = − 102
σ = − 105/2
In actual data analyses, the estimated population growth parameter should suffer some uncertainties (see e.g., ). To examine the impacts of such uncertainties, we applied our new tests on the data sets simulated under the both ends of the 95% confidence interval, r=2.6×10-3 and 5.7×10-3, using the null distributions estimated from simulations of neutral mutations with the above MLE, r=4.0×10-3. Our new tests retained almost the same performance as those using the correct growth parameters (Additional file 1: Table S13), demonstrating that the tests are robust under these uncertainties.
In this study, we introduced two new population genetics tests to detect negative selection on recurrent mutations. Our computer simulation analyses demonstrated high powers of these tests to detect recurrent deleterious mutations in constant-size populations, and moderate detection powers in expanding populations. To the best of our knowledge, this is the first ever attempt to detect negative selection by using recurrent mutations, and our tests turned out to be superior to traditional neutrality tests that do not fare well in this respect. To illustrate this point, we also applied some widely used traditional neutrality tests, Ewens’ test , the Ewens-Watterson test , and Tajima’s D test , to our constant-population dataset (Additional file 1). We found that these tests detected selection only slightly better than expected by chance (Additional file 1: Tables S14, S15 and S16). This is understandable because applying a traditional neutrality test to SNPs in the flanking regions of a locus undergoing recurrent deleterious mutations is tantamount to attempting to detect “background selection” on a linked genomic region using only information from a single locus, which was shown to be very difficult (e.g., ). Of course, out tests will not undermine the value of these traditional neutrality tests, because they are known to detect other types of deviations from the standard neutral population genetic model (see e.g., [12, 25, 74]).
We should keep in mind that this study is merely a first step, because the tests have so far been applied to only the simplest cases (a selectively neutral background without recombination in a constant-size population or a regularly expanding population). For future tests to be really useful, we will have to examine how robust the tests are against various confounding factors, such as population substructure and migration (e.g., [62, 75, 76]), background selection, recombination, and mutation kinetics. Although such analyses were not conducted in this study, we may be able to roughly predict the effects of some of such factors and potential countermeasures.
Recombination will confound the inference of gene genealogy, possibly causing false-positives e.g., by splitting the descendant cluster of a forward mutation event, and false-negatives e.g., by merging the descendant clusters of two independent mutation events. Such factors may only have modest effects on our new tests, because our choice of the number of flanking SNPs (S=50) is similar to the typical number of SNPs within a haplotype block in the human genome (e.g., [27, 28]), and because mutant clusters under detectable negative selection are usually too small for recombination to either split or merge. Nevertheless, recombination may impact our tests at least occasionally, especially when the subject locus spans and/or is flanked by more than one haplotype block. To be robust under such effects, we will have to grade up our tests so that they can handle multiple genealogies arranged along a tested region.
Another issue that should be explored in the future is the modeling of mutation kinetics. Although we found that the test results do not substantially differ across a wide range of backward/forward ratios, from v/μ = 0 to v/μ = 3, they are just within the two-state model . Recurrent mutations could occur more frequently at multistate loci, which might be describable only by their own particular models, such as multisite models (e.g., ), a step-wise mutation model  or its extended versions (e.g., [78, 79]). In principle, model misspecification could lead to erroneous results, so how to assign a correct mutation model to each locus would be an important issue to study. Nevertheless, as long as the locus has only two states, or if its multiple states can be classified into two broad categories under some objective criteria, the results of our study should hold.
Relationship with background selection
The words “deleterious recurrent mutations” may be reminiscent of background selection, whereby deleterious mutations on a nearly non-recombining genomic region reduce the regional effective population size and thus reduce the regional genetic variability as compared to a freely recombining region (e.g., [73, 80–83]). This mechanism could be related to our new neutrality tests in at least two different ways: first as a potential subject of our new tests, and second as a potential noise hampering our tests. These aspects will be discussed in some details in Supplementary discussion in Additional file 1. Recently, some complications on background selection have been revealed (e.g., [84, 85]). To fully understand how our new tests will be impacted by background selection, or more generally the Hill-Robertson interference , we will need further studies using simulated data (e.g., ) and possibly data on Drosophila genomes (e.g., [85, 87–89]).
Comparing the definitions of our test statistics to those of traditional tests
One of our test statistic, Max D | M , is somewhat reminiscent of the statistic for Ewens’ test, which is the frequency of the most common haplotype conditional on the number of haplotypes in the sample (K). The other test statistic, Tot D | M , could be regarded as an analog of the statistic for the EW test, which is the haplotype homozygosity conditional on K. Despite the similarities, whereas the traditional tests detected negative selection on recurrent mutations at rates that are at best marginally better than that obtained by chance, our new tests detected negative selection at quite high rates. What causes this difference?
One big difference between the two groups of tests is that our tests only count mutant alleles with mutations whose effects we wish to examine, such as structural variations, while traditional tests count all haplotypes including those not bearing the mutations of interest. Because deleterious mutants in general account for only a minority among sampled sequences, haplotypes not bearing the mutation of interest determine the major behaviors of the traditional test statistics, which obscures the signals of the deleterious mutations. In contrast, our test statistics, Max D | M and Tot D | M , only contain information on the mutation of interest. Therefore, they are unlikely to be disturbed by stochastic fluctuations affecting other haplotypes.
For theoretical studies of the new tests, it might be better to have analytical formulae for the null-distributions. Given the aforementioned similarity between our tests and Ewens’ and the EW tests, such formulae may be derivable at least under a constant-size population, by modifying the derivation of the Ewens sampling formula [16, 17] and/or following a path similar to but slightly different from that to the equations (8) and (11) in . The formulas in  were derived under the modified infinite-alleles model with two classes of alleles, one selectively neutral, the other deleterious [91, 92]. It should be noted that past studies [91, 92] focused on formulas under a fixed number of sampled deleterious mutants. What we need here, however, are null-distributions, which must be derived under a fixed total number of randomly sampled sequences, including both classes of alleles, and under the selective neutrality of both classes. (Also, mutations must be turned off between alleles in the same class.) Once analytical null-distributions are derived under such a neutral two-class model, we will be able to define yet another new statistical test similar to Slatkin’s exact test [9, 10], by using the full configuration, (D 1 ,D 2 ,…,D M ), of the numbers of sampled mutants resulting from identified forward mutations. Such an “exact test” might be slightly more powerful than the two tests proposed in this paper, because it can partition the sample space more finely. Once derived, the null-distributions may be relatively easily extended to an expanding population, whose effects were also examined in .
Extended application of our new neutrality tests
In this paper, we mainly examined the performance of our new neutrality tests applied to recurrent mutation on a simple SV locus. However, as briefly explained in the Background, our new tests could possibly be applicable to other types of recurrent mutations as long as they satisfy two conditions: (i) the subject mutations share some features clearly distinguishable from other, mostly neutral mutations; and (ii) sequences with subject mutations can be sub-classified at least approximately into classes of shared origins by some means, such as a sequence genealogy. As a third kind of subject, we mentioned a class of sites that are lumped together according to putative signs of functional loss or impairment of a gene locus.
For example, phenylketonuria is a disease caused by hundreds of types of disabling or malfunctioning mutations on the phenylalanine hydroxylase (PAH) gene (reviewed e.g., in [93, 94]). Our new tests are likely to detect (or rediscover) such diseases (see Additional file 1) and, by analogy, the tests are also expected to detect purifying selection operating on putative genes with unknown functions. This might considerably extend the use of our new tests, because they may help identify cryptic diseases, or they could help validate putative genes that are automatically annotated e.g., by genome projects. To make sure that this is true, however, we need to further test their performance in realistic settings.
It should be noted that the sequence genealogy may not need be reconstructed when applying our new tests to this class of mutations, because different mutational origins are likely to be identified by the locations and characteristics of the mutations.
Potential use of our new parsimony algorithm to enumerate mutation scenarios
As a requirement for our new tests, we developed a new parsimony algorithm that maps a minimum number of mutations on a genealogy while resolving incomplete phylogenetic relationships if necessary, given an incompletely resolved genealogy and current states of sequences at a recurrently mutating locus (Additional file 2). The algorithm is a modified extension of Sankoff’s parsimony algorithm  to a multifurcated phylogenetic tree. Although we invented the algorithm in order to define the Max D | M and Tot D | M test statistics, the algorithm may actually find wider applications. For example, it may be extended to infer a finely resolved gene genealogy by combining fast-evolving characters, such as micro-satellite polymorphisms, with slow-evolving characters, such as SNPs in a linked region.
Detecting selection on mutants has been a crucial goal of population and medical genetics. However, it has been very difficult to identify negatively selected (deleterious) mutants via purely population genetics methods, mostly because deleterious mutants leave only weak molecular signals that are very difficult to detect. We came up with the novel idea of synergizing the signals left by recurrent mutation events on gene genealogy, and devised two statistics, Max D | M and Tot D | M , to detect negative selection on recurrent mutations at a subject locus. Our simulation analyses demonstrated that the neutrality tests based on these two statistics have high powers to detect negative selection under a constant-size population, and moderate powers under expanding populations. The next task will be to examine whether these methods also work under more realistic population genetics conditions, by including such factors as recombination and population substructure. Our new neutrality tests can be used with segmental mutations, such as genome structural variations and microsatellite mutations, data on which are expected to increase steadily as experimental technologies continue to advance. Our tests open new venues for studying the population genetics of recurrent mutations, and may become useful in molecular medicine by identifying genomic disorders that may have escaped identification by currently existing methods. Most of the scripts and Perl modules used in this study, including the Perl module implementing our new parsimony algorithm to enumerate mutation scenarios, are packaged in their original forms into Additional file 3 (a ZIP archive).
Copy number variation
- EW test:
Single nucleotide polymorphism
We are grateful to Dr. G. Ewing (University of Vienna) for kindly helping us with the msms software. We also thank Dr. H. Innan (Graduate University for Advanced Studies) for his helpful suggestions and three anonymous referees. This work was supported in part by US National Library of Medicine grant LM010009-01 to DG and GL.
- Crow JF, Kimura M: An Introduction to Population Genetics Theory. 1970, Caldwell, NJ, USA: Blackburn PressGoogle Scholar
- Gillespie JH: Population Genetics: A Concise Guide. 2004, Baltimore, Maryland, USA: Johns Hopkins Univ. Press, 2Google Scholar
- Hartl DL, Clak AG: Principles of Population Genetics. 2007, Sunderland, Massachusetts, USA: Sinauer Associates, Inc., 4Google Scholar
- Hedrick PW: Genetics of Populations. 2011, Sudbury, Massachusetts, USA: Jones and Bartlett Publishers, 4Google Scholar
- Ewens WJ: Testing for increased mutation rate for neutral alleles. Theor Popul Biol. 1973, 4: 251-258. 10.1016/0040-5809(73)90010-5.View ArticlePubMedGoogle Scholar
- Watterson GA: The homozygosity test of neutrality. Genetics. 1978, 88: 405-417.PubMed CentralPubMedGoogle Scholar
- Tajima F: Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 1989, 123: 585-595.PubMed CentralPubMedGoogle Scholar
- Fu YX, Li WH: Statistical tests of neutrality of mutations. Genetics. 1993, 133: 693-709.PubMed CentralPubMedGoogle Scholar
- Slatkin M: An exact test for neutrality based on the Ewens sampling distribution. Genet Res. 1994, 64: 71-74. 10.1017/S0016672300032560.View ArticlePubMedGoogle Scholar
- Slatkin M: A correction to the exact test based on the Ewens sampling distribution. Genet Res. 1996, 68: 259-260. 10.1017/S0016672300034236.View ArticlePubMedGoogle Scholar
- Fay JC, Wu CI: Hitchhiking under positive Darwinian selection. Genetics. 2000, 155: 1405-1413.PubMed CentralPubMedGoogle Scholar
- Zeng K, Fu YX, Shi S, Wu CI: Statistical tests for detecting positive selection by utilizing high-frequency variants. Genetics. 2006, 174: 1431-1439. 10.1534/genetics.106.061432.PubMed CentralView ArticlePubMedGoogle Scholar
- Kimura M: Evolutionary rate at the molecular level. Nature. 1968, 217: 624-626. 10.1038/217624a0.View ArticlePubMedGoogle Scholar
- Kimura M: Genetic variability maintained in a finite population due to mutational production of neutral and nearly neutral isoalleles. Genet Res. 1968, 11: 247-270. 10.1017/S0016672300011459.View ArticlePubMedGoogle Scholar
- Kimura M: The Neutral Theory of Molecular Evolution. 1983, Cambridge, UK: Cambridge University PressView ArticleGoogle Scholar
- Ewens WJ: The sampling theory of selectively neutral alleles. Theor Popul Biol. 1972, 4: 251-258.View ArticleGoogle Scholar
- Karlin S, McGregor J: Addendum to a paper of W. Ewens. Theor Popul Biol. 1972, 3: 113-116. 10.1016/0040-5809(72)90036-6.View ArticlePubMedGoogle Scholar
- Hudson RR, Kaplan NL: The coalescent process in models with selection and recombination. Genetics. 1988, 120: 831-840.PubMed CentralPubMedGoogle Scholar
- Kaplan NL, Darden T, Hudson RR: The coalescent process in models with selection. Genetics. 1988, 120: 819-829.PubMed CentralPubMedGoogle Scholar
- Kelly JK: A test of neutrality based on interlocus associations. Genetics. 1997, 146: 1197-1206.PubMed CentralPubMedGoogle Scholar
- Kelly JK, Wade MJ: Molecular evolution near a two-locus balanced polymorphism. J Theor Biol. 2000, 204: 83-101. 10.1006/jtbi.2000.2003.View ArticlePubMedGoogle Scholar
- Sabeti PC, Varilly P, Fry B, Lohmueller J, Hostetter E, Cotsapas C, Xie X, Byrne EH, McCarroll SA, Gaudet R, Schaffner SF, Lander ES, The International HapMap Consortium: Genome-wide detection and characterization of positive selection in human populations. Nature. 2007, 449: 913-918. 10.1038/nature06250.PubMed CentralView ArticlePubMedGoogle Scholar
- Thornton KR, Jensen JD, Becquet C, Andolfatto P: Progress and prospects for mapping resent selection in the genome. Heredity. 2007, 98: 340-348.PubMedGoogle Scholar
- Pavlidis P, Hutter S, Stephan W: A population genomic approach to map recent positive selection in model species. Mol Ecol. 2008, 17: 3585-3598.PubMedGoogle Scholar
- Zhai W, Nielsen R, Slatkin M: An investigation of the statistical power of neutrality tests based on comparative and population genetics data. Mol Biol Evol. 2009, 26: 273-283. 10.1093/molbev/msn231.PubMed CentralView ArticlePubMedGoogle Scholar
- Kimura M: The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics. 1969, 61: 893-903.PubMed CentralPubMedGoogle Scholar
- The International HapMap Consortium: A haplotype map of the human genome. Nature. 2005, 437: 1299-1320. 10.1038/nature04226.PubMed CentralView ArticleGoogle Scholar
- The International HapMap Consortium: A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007, 449: 851-861. 10.1038/nature06258.PubMed CentralView ArticleGoogle Scholar
- Day INM: dbSNP in the detail and copy number complexities. Hum Mutat. 2010, 31: 2-4. 10.1002/humu.21149.View ArticlePubMedGoogle Scholar
- The International HapMap 3 Consortium: Integrating common and rare genetic variation in diverse human populations. Nature. 2010, 467: 52-58. 10.1038/nature09298.PubMed CentralView ArticleGoogle Scholar
- Sebat J, Lakshmi B, Troge J, Alexander J, Young J, Lundin P, Månér S, Massa H, Walker M, Chi M, Mavin N, Lucito R, Healy J, Hicks J, Ye K, Reiner A, Gilliam TC, Trask B, Patterson N, Zetterberg A, Wigler M: Large-scale copy number polymorphism in the human genome. Science. 2004, 305: 525-528. 10.1126/science.1098918.View ArticlePubMedGoogle Scholar
- Tuzun E, Sharp AJ, Bailey JA, Kaul R, Morrison VA, Pertz LM, Haugen E, Hayden H, Albertson D, Pinkel D, Olson MV, Eichler EE: Fine-scale structural variation of the human genome. Nat Genet. 2005, 37: 727-732. 10.1038/ng1562.View ArticlePubMedGoogle Scholar
- Fiegler H, Redon R, Andrews D, Scott C, Andrews R, Carder C, Clark R, Dovey O, Ellis P, Feuk L, French L, Hunt P, Kalaitzopoulos D, Larkin J, Montgomery L, Perry GH, Plumb BW, Porter K, Rigby RE, Rigler D, Valsesia A, Langford C, Humphray SJ, Scherer SW, Lee C, Hurles ME, Carter NP: Accurate and reliable high-throughput detection of copy number variation in the human genome. Genome Res. 2006, 16: 1566-1574. 10.1101/gr.5630906.PubMed CentralView ArticlePubMedGoogle Scholar
- Korbel JO, Urban AE, Affourtit JP, Godwin B, Grubert F, Simons JF, Kim PM, Palejev D, Carriero NJ, Du L, Taillon BE, Chen Z, Tanzer A, Saunders ACE, Chi J, Yang F, Carter NP, Hurles ME, Weissman SM, Harkins TT, Gerstein MB, Egholm M, Snyder M: Paired-end mapping reveals extensive structural variation in the human genome. Science. 2007, 318: 420-426. 10.1126/science.1149504.PubMed CentralView ArticlePubMedGoogle Scholar
- Medvedev P, Stanciu M, Brudno M: Computational methods for discovering structural variation with next-generation sequencing. Nat Methods. 2009, 6: s13-s20. 10.1038/nmeth.1374.View ArticlePubMedGoogle Scholar
- Maydan J, Lorch A, Edgley ML, Flibotte S, Moerman DG: Copy number variation in the genomes of twelve natural isolates of Caenorhabditis elegans. BMC Genomics. 2010, 11: 62-10.1186/1471-2164-11-62.PubMed CentralView ArticlePubMedGoogle Scholar
- Emerson JJ, Cardoso-Moreira M, Borevitz JO, Long M: Natural selection shapes genome-wide patterns of copy-number polymorphism in Drosophila melanogaster. Science. 2008, 320: 1629-1631. 10.1126/science.1158078.View ArticlePubMedGoogle Scholar
- Ossowski S, Schneeberger K, Clark RM, Lanz C, Warthmann N, Weigel D: Sequencing of natural strains of Arabidopsis thaliana with short reads. Genome Res. 2008, 18: 2024-2033. 10.1101/gr.080200.108.PubMed CentralView ArticlePubMedGoogle Scholar
- Perry G, Dominy NJ, Claw KG, Lee AS, Fiegler H, Redon R, Werner J, Vilanea FA, Mountain JL, Misra R, Carter NP, Lee C, Stone AC: Copy number variation and evolution in humans and chimpanzees. Genome Res. 2008, 18: 1698-1710. 10.1101/gr.082016.108.PubMed CentralView ArticlePubMedGoogle Scholar
- She X, Cheng Z, Zöllner S, Church DM, Eichler EE: Mouse segmental duplication and copy number variation. Nat Genet. 2008, 40: 909-914. 10.1038/ng.172.PubMed CentralView ArticlePubMedGoogle Scholar
- Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W, Cho EK, Dallaire S, Freeman JL, Gonzaléz JR, Gratacòs M, Huang J, Kalaitzopoulos D, Komura D, MacDonald JR, Marshall CR, Mei R, Montgomery L, Nishimura K, Okamura K, Shen F, Somerville MJ, Tchinda J, Valsesia A, Woodward C, Yang F: Global variation in copy number in the human genome. Nature. 2006, 444: 444-454. 10.1038/nature05329.PubMed CentralView ArticlePubMedGoogle Scholar
- Conrad DF, Pinto D, Redon R, Feuk L, Gokcumen O, Zhang Y, Aerts J, Andrews TD, Barnes C, Campbell P, Fitzgerald T, Hu M, Ihm CH, Kristiansson K, MacArthur DG, MacDonald JR, Onyiah I, Pang AWC, Robson S, Stirrups K, Valsesia A, Walter KWJ, Tyler-Smith C, Carter NP, Lee C, Scherer SW, Hurles ME, Wellcome Trust Case Control Consortium: Origins and functional impact of copy number variation in the human genome. Nature. 2010, 464: 704-712. 10.1038/nature08516.PubMed CentralView ArticlePubMedGoogle Scholar
- Mills RE, Walter K, Stewart C, Handsaker RE, Chen K, Alkan C, Abyzov A, Yoon SC, Ye K, Cheetham RK, Chinwalla A, Conrad DF, Fu Y, Grubert F, Hajirasouliha I, Hormozdiari F, Iakoucheva LM, Iqbal Z, Kang S, Kidd JM, Konkel MK, Korn J, Khurana E, Kural D, Lam HYK, Rausch T, Scally A, Lin CY, Luo R: Mapping copy number variation by population-scale genome sequencing. Nature. 2011, 470: 59-65. 10.1038/nature09708.PubMed CentralView ArticlePubMedGoogle Scholar
- Lupski JR: Genomic rearrangements and sporadic disease. Nat Genet. 2007, 39: s43-s47. 10.1038/ng2084.View ArticlePubMedGoogle Scholar
- Fu W, Zhang F, Wang Y, Gu X, Jin L: Identification of copy number variation hotspots in human populations. Am J Hum Genet. 2010, 87: 494-504. 10.1016/j.ajhg.2010.09.006.PubMed CentralView ArticlePubMedGoogle Scholar
- Antonacci F, Kidd JM, Marques-Bonet T, Ventura M, Siswara P, Jiang Z, Eichler EE: Characterization of six human disease-associated inversion polymorphisms. Hum Mol Genet. 2009, 28: 2555-2566.View ArticleGoogle Scholar
- Perry GH, Dominy NJ, Claw KG, Lee AS, Fiegler H, Redon R, Werner J, Villanea FA, Mountain JL, Misra R, Carter NP, Lee C, Stone AC: Diet and the evolution of human amylase gene copy number variation. Nat Genet. 2007, 39: 1256-1260. 10.1038/ng2123.PubMed CentralView ArticlePubMedGoogle Scholar
- Nozawa M, Kawahara Y, Nei M: Genomic drift and copy number variation of sensory receptor genes in humans. Proc Natl Acad Sci USA. 2007, 104: 20421-20426. 10.1073/pnas.0709956104.PubMed CentralView ArticlePubMedGoogle Scholar
- Ewing G, Hermisson J: MSMS: a coalescent simulation program including recombination, demographic structure, and selection at a single locus. Bioinformatics. 2010, 26: 2064-2065. 10.1093/bioinformatics/btq322.PubMed CentralView ArticlePubMedGoogle Scholar
- Ellegren H: Microsatellites: simple sequences with complex evolution. Nat Rev Genet. 2004, 5: 435-445.View ArticlePubMedGoogle Scholar
- Golding GB: The effect of purifying selection on genealogies. Progress in population genetics and human evolution. Edited by: Donnelly P, Tavaré S. 1997, New York: Splinger-Verlag, 271-285.View ArticleGoogle Scholar
- Krone SM, Neuhauser C: Ancestral process with selection. Theor Popul Biol. 1997, 51: 210-237. 10.1006/tpbi.1997.1299.View ArticlePubMedGoogle Scholar
- Neuhauser C, Krone SM: The genealogy of samples in models with selection. Genetics. 1997, 145: 519-534.PubMed CentralPubMedGoogle Scholar
- Przeworski M, Charlesworth B, Wall JD: Genealogies and weak purifying selection. Mol Biol Evol. 1999, 16: 246-252. 10.1093/oxfordjournals.molbev.a026106.View ArticlePubMedGoogle Scholar
- Slade PF: Simulation of selected genealogies. Theor Popul Biol. 2000, 57: 35-49. 10.1006/tpbi.1999.1438.View ArticlePubMedGoogle Scholar
- Williamson S, Orive ME: The genealogy of a sequence subject to purifying selection at multiple sites. Mol Biol Evol. 2002, 19: 1376-1384. 10.1093/oxfordjournals.molbev.a004199.View ArticlePubMedGoogle Scholar
- Watterson GA: The sampling theory of selectively neutral alleles. Adv Appl Prob. 1974, 6: 463-488. 10.2307/1426228.View ArticleGoogle Scholar
- Wright S: Evolution in Mendelian populations. Genetics. 1931, 16: 97-159.PubMed CentralPubMedGoogle Scholar
- Saitou N, Nei M: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987, 4: 406-425.PubMedGoogle Scholar
- Fitch WM: Toward defining the course of evolution: minimum change for a specified tree topology. Syst Zool. 1971, 20: 406-416. 10.2307/2412116.View ArticleGoogle Scholar
- Sankoff D: Minimal mutation trees of sequences. SIAM J Appl Math. 1975, 28: 35-42. 10.1137/0128004.View ArticleGoogle Scholar
- Gutenkunst RN, Hernandez RD, Williamson SH, Bustamante CD: Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 2009, 5: e1000695-10.1371/journal.pgen.1000695.PubMed CentralView ArticlePubMedGoogle Scholar
- Gu W, Zhang F, Lupski JR: Mechanisms for human genomic rearrangements. Pathogenetics. 2008, 1: 4-10.1186/1755-8417-1-4.PubMed CentralView ArticlePubMedGoogle Scholar
- Milá B, Girman DJ, Kimura M, Smith TB: Genetic evidence for the effect of a postglacial population expansion on the phylogeography of North American songbird. Proc Biol Sci. 2000, 267: 1033-1040. 10.1098/rspb.2000.1107.PubMed CentralView ArticlePubMedGoogle Scholar
- Xue Y, Zerjal T, Bao W, Zhu S, Shu Q, Xu J, Du R, Fu S, Li P, Hurles ME, Yang H, Tyler-Smith C: Male demography in East Asia: a north–south contrast in human population expansion times. Genetics. 2006, 172: 2431-2439.PubMed CentralView ArticlePubMedGoogle Scholar
- Kawamoto Y, Shotake T, Nozawa K, Kawamoto S, Tomari K, Kawai S, Shirai K, Morimitsu Y, Takagi N, Akaza H, Fujii H, Hagihara K, Aizawa K, Skachi S, Oi T, Hayaishi S: Postglacial population expansion of Japanese macaques (Macaca fuscata) inferred from mitochondrial DNA phylogeny. Primates. 2007, 48: 27-40.View ArticlePubMedGoogle Scholar
- Mirol PM, Routtu J, Hoikkala A, Butlin RK: Signals of demographic expansion in Drosophila virilis. BMC Evol Biol. 2008, 8: 59-10.1186/1471-2148-8-59.PubMed CentralView ArticlePubMedGoogle Scholar
- Slatkin M, Hudson RR: Pairwise comparisons of mitochondrial DNA sequences in stable and exponentially growing populations. Genetics. 1991, 129: 555-562.PubMed CentralPubMedGoogle Scholar
- Griffiths RC, Tavaré S: Sampling theory for neutral alleles in a varying environment. Philos Trans R Soc Lon B Biol Sci. 1994, 344: 403-410. 10.1098/rstb.1994.0079.View ArticlePubMedGoogle Scholar
- Slatkin M: Linkage disequilibrium in growing and stable populations. Genetics. 1994, 137: 331-336.PubMed CentralPubMedGoogle Scholar
- Williamson SH, Hernandez R, Fledel-Alon A, Zhu L, Nielsen R, Bustamante CD: Simultaneous inference of selection and population growth from patterns of variation in the human genome. Proc Natl Acad Sci USA. 2005, 102: 7882-7887. 10.1073/pnas.0502300102.PubMed CentralView ArticlePubMedGoogle Scholar
- Boyko AR, Williamson SH, Indap AR, Degenhardt JD, Hernandez RD, Lohmueller KE, Adams MD, Schmidt S, Sninsky JJ, Sunyaev SR, White TJ, Nielsen R, Clark AG, Bustamante CD: Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet. 2008, 4: e1000083-10.1371/journal.pgen.1000083.PubMed CentralView ArticlePubMedGoogle Scholar
- Charlesworth D, Charlesworth B, Morgan MT: The pattern of neutral molecular variation under the background selection model. Genetics. 1995, 141: 1619-1632.PubMed CentralPubMedGoogle Scholar
- Zeng K, Mano S, Shi S, Wu CI: Comparisons of site- and haplotype-frequency methods for detecting positive selection. Mol Biol Evol. 2007, 24: 1562-1574. 10.1093/molbev/msm078.View ArticlePubMedGoogle Scholar
- Wright S: The genetical structure of populations. Ann Eugen. 1951, 15: 323-354.View ArticlePubMedGoogle Scholar
- Slatkin M, Barton NH: A comparison of three indirect methods for estimating average levels of gene flow. Evolution. 1989, 43: 1349-1368. 10.2307/2409452.View ArticleGoogle Scholar
- Ohta T, Kimura M: A model of mutation appropriate to estimate the number of electrophoretically detectable alleles in a finite population. Genet Res. 1973, 22: 201-204. 10.1017/S0016672300012994.View ArticlePubMedGoogle Scholar
- Estoup A, Jarne P, Cornuet JM: Homoplasy and mutation model at microsatellite loci and their consequences for population genetics analysis. Mol Ecol. 2002, 11: 1591-1604. 10.1046/j.1365-294X.2002.01576.x.View ArticlePubMedGoogle Scholar
- Sainudiin R, Durrett RT, Aquadro CF, Nielsen R: Microsatellite mutation models: Insights from a comparison of humans and chimpanzees. Genetics. 2004, 168: 383-395. 10.1534/genetics.103.022665.PubMed CentralView ArticlePubMedGoogle Scholar
- Charlesworth B, Morgan MT, Charlesworth D: The effect of deleterious mutations on neutral molecular variation. Genetics. 1993, 134: 1289-1303.PubMed CentralPubMedGoogle Scholar
- Hudson RR: How can the low levels of DNA sequence variation in regions of the Drosophila genome with low recombination rates explained?. Proc. Natl. Acad. Sci USA. 1994, 91: 6815-6818. 10.1073/pnas.91.15.6815.PubMed CentralView ArticlePubMedGoogle Scholar
- Hudson RR, Kaplan NL: Coalescent process and background selection. Phil. Transac. Biol. Sci. 1995, 349: 19-23. 10.1098/rstb.1995.0086.View ArticleGoogle Scholar
- Nordborg M, Charlesworth B, Charlesworth D: The effect of recombination on background selection. Genet Res. 1996, 67: 159-174. 10.1017/S0016672300033619.View ArticlePubMedGoogle Scholar
- Kaiser VB, Charlesworth B: The effects of deleterious mutations on evolution in non-recombining genomes. Trends Genet. 2009, 25: 9-12. 10.1016/j.tig.2008.10.009.View ArticlePubMedGoogle Scholar
- Charlesworth B, Betancourt AJ, Kaiser VB, Gordo I: Genetic recombination and molecular evolution. Cold Spring Harb Symp Quant Biol. 2009, 74: 177-186. 10.1101/sqb.2009.74.015.View ArticlePubMedGoogle Scholar
- Hill WG, Robertson A: The effect of linkage on limits to artificial selection. Genet Res. 1966, 8: 269-294. 10.1017/S0016672300010156.View ArticlePubMedGoogle Scholar
- Arguello JR, Zhang Y, Kado T, Fan C, Zhao R, Innan H, Wang W, Long M: Recombination yet inefficient selection along the Drosophila melanogaster subgroup’s fourth chromosome. Mol Biol Evol. 2010, 27: 848-861. 10.1093/molbev/msp291.PubMed CentralView ArticlePubMedGoogle Scholar
- Campos JL, Charlesworth B, Haddrill PR: Molecular evolution in nonrecombining regions of the Drosoplhila melanogaster genome. Genome Biol Evol. 2012, 4: 278-288. 10.1093/gbe/evs010.PubMed CentralView ArticlePubMedGoogle Scholar
- McGaugh SE, Hell CSS, Manzano-Winkler B, Loewe L, Goldstein S, Himmel TL, Noor MAF: Recombination modulates how selection affects linked sites in Drosophila. PLoS Biol. 2012, 10: e1001422-10.1371/journal.pbio.1001422.PubMed CentralView ArticlePubMedGoogle Scholar
- Slatkin M, Rannala B: The sampling distribution of disease-associated alleles. Genetics. 1997, 147: 1855-1861.PubMed CentralPubMedGoogle Scholar
- Hartl DL, Campbell RB: Allele multiplicity in simple Mendelian disorders. Am J Hum Genet. 1982, 34: 866-873.PubMed CentralPubMedGoogle Scholar
- Sawyer S: A stability property of the Ewens sampling formula. J Appl Prob. 1983, 20: 449-459. 10.2307/3213883.View ArticleGoogle Scholar
- Scriver CR: The PAH gene, phenylketonuria, and a paradigm shift. Hum Mutat. 2007, 28: 831-845. 10.1002/humu.20526.View ArticlePubMedGoogle Scholar
- Blau N, van Spronsen FJ, Levy HL: Phenylketonuria. Lancet. 2010, 376: 1417-1427. 10.1016/S0140-6736(10)60961-0.View ArticlePubMedGoogle Scholar
- Ezawa K: DENSERM: DEtecting Negative SElection on Recurrent Mutations. [http://www.bioinformatics.org/ftp/pub/DENSERM/]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.