Tree measures and the number of segregating sites in time-structured population samples
- Roald Forsberg^{1}Email author,
- Alexei J Drummond^{2} and
- Jotun Hein^{3}
https://doi.org/10.1186/1471-2156-6-35
© Forsberg et al; licensee BioMed Central Ltd. 2005
Received: 19 October 2004
Accepted: 16 June 2005
Published: 16 June 2005
Abstract
Background
Time-structured genetic samples are a valuable source of information in population genetics because they provide several correlated observations of the underlying evolutionary processes. In this paper we study basic properties of the genetic variation in time-structured samples as reflected in the genealogies relating individuals and the number of segregating sites observed. Our emphasis is on "measurably evolving populations" i.e. populations from which it is possible to obtain time-structured samples that span a significant interval of evolutionary time.
Results
We use results from the coalescent process to derive properties of time-structured samples. In the first section we extend existing results to attain measures on coalescent trees relating time-structured samples. These include the expected time to a most recent common ancestor, the expected total branch length and the expected length of branches subtending only ancient individuals. The effect of different sampling schemes on the latter measure is studied. In the second section we study the special case where the full sample consists of a group of contemporary extant samples and a group of contemporary ancient samples. As regards this case, we present results and applications concerning the probability distribution of the number of segregating sites where a mutation is unique to the ancient individuals and the number of segregating sites where a mutation is shared between ancient and extant individuals.
Conclusion
The methodology and results presented here is of use to the design and interpretation of ancient DNA experiments. Furthermore, the results may be useful in further development of statistical tests of e.g. population dynamics and selection, which include temporal information.
Keywords
Background
Time-structured genetic samples
Genetic samples obtained over several points in time are a valuable source of information in population genetics because they provide several correlated observations of the underlying evolutionary processes.
These time-structured samples can be separated into two qualitatively different groups. Firstly, samples may be taken over such a short evolutionary time that the occurrence of mutations between sampling points can be ignored. Samples of this type have a long standing history in the study of the process of drift and selection via observations of allele frequencies (see e.g. [1]). Secondly, time-structured samples may be obtained over intervals of evolutionary time that are long enough for mutation to become a relevant force in shaping the diversity between samples from different time points. To reflect the fact that the latter type of samples are capable of showing new variation arising, Drummond et al. used the term "Measurably Evolving Populations" (MEP) to describe populations from which biological and technological constraints allow samples of this type to be obtained [2]. Such measurably evolving populations arise from two principal sources, namely rapidly evolving microorganisms e.g. [3] and well characterised vertebrate subfossil material from which ancient DNA can be reliably amplified e.g. [4–6].
Besides a report by Nordborg [7], population genetic studies of MEPs have mostly focused on the construction and use of models for population genetic inference that incorporate the time structure of the data [8–12]. However, our interest here is not that of inference, but rather to study basic properties of time-structured samples obtained from measurably evolving populations. To this end, we use a simple model of a constant sized panmictic population of haploids and base our results on the standard coalescent process [13, 14].
The paper consists of two parts. In the first part we use recursions to derive results for various measures on coalescent trees connecting time-structured samples having any time-structure. In particular, we focus on the effect that different sampling schemes have on the expected length of the branches in the tree upon which mutations can arise that are unique to the ancient individuals of the sample. This is a measure of the expected amount of genetic variation unique to the ancestral material and as such of intrinsic interest to the design of many ancient DNA studies, for example when the objective is to discover unique ancient haplotypes. In the second part, we study the case where the sample has a time-structure of only two time-points, one consisting of a number of contemporary extant individuals and one consisting of a number of contemporary ancient individuals. For this case we obtain the probability distributions of the number of segregating sites where the mutation is observed only in ancient lineages and of the number of segregating sites where the mutation is shared between ancient and extant lineages. Using these results we study the number of ancient samples needed to observe at least one unique or one shared mutation as a function of population parameters.
The results presented here should be of particular interest to studies of ancient DNA and the design of ancient DNA sampling schemes.
Notation
The standard coalescent
By a continuous-time approximation, Kingman has shown that for a contemporary sample (τ_{ i }= 0, ∀_{ i }) a process termed the coalescent can describe the genealogical relationships of the individuals in the sample [13]. Kingman's result applies to the Wright-Fisher model and a broad class of other reproductive models that share a common set of requirements [13]. Briefly, the standard coalescent is a process describing the death of lineages through coalescence events. Some fundamental results concerning the standard coalescent which we use in the following are given here:
The waiting times W_{ n }, W_{n-1},..., W_{2} between successive coalescent events are exponentially distributed as
when time is measured in units of N generations [17].
Let time increase into the past and let the number of distinct line ages at time t, {A(t), t ≥ 0} be given by the death process described by the coalescent. Following [18] we have that given a lineages were sampled at time 0, the conditional probability distribution of the number of remaining lineages at time t is
for 2 ≤ b ≥ a, and for the case b = 1
where a_{[j]}= a(a - 1)...(a - j + 1), and a_{(j)}= a(a + 1)...(a + j - 1).
In the following it will be of interest to study the number of ancestors of sub-samples which in this context are comprised of ancient lineages. Let Q be a sub-sample of the full sample consisting of all ancient individuals and let Q^{ c }be the complement of Q which contains the extant individuals. It is our interest to study the number of ancestral lineages at some time t that subtend leaves in the tree which are exclusive to Q, {A^{(Q)}(t), t ≥ 0}, as mutations occurring on these branches will be unique to Q, i.e. be found solely in ancient samples (Figure 1). However, this is equivalent to recording the total number of ancestral lineages at time t, A(t), and the number of ancestral lineages that subtend one or more leaves (not necessarily exclusively) in Q^{ c }{ (t), t ≥ 0}, as the two are related by A(t) = A^{(Q)}(t) + (t). The bivariate process {A(t), (t)} for a sample of contemporary individuals has been studied extensively by Saunders et al. [19], and from this we extract results for the conditional distribution of the bivariate process {A(t), A^{(Q)}(t)} using the above relation
where Pr{A(t) = b|A(0) = a} is given by (2), and
The coalescent process of time-structured data
When samples have time-structure (τ_{ i }> 0 for some i) the simple death process of the coalescent is replaced by a series of death processes, interrupted at specified points in time by new lineages entering the process (Figure 1). We note that this would correspond to a birth-death process of lineages if sampling events were random rather than known. However, lacking knowledge about the properties of the sampling process we restrict ourselves to condition on known sampling times. Thus, this serial coalescent process can be modelled on the basis of the standard coalescent process, by the following algorithm: At the second sampling point (no coalescence in first interval) a contemporary coalescent is initiated with two individuals; this process is continued for τ_{3,2} time units; at τ_{3} time units another individual enters the process; again a coalescent process is continued for τ_{4,3} time units until τ_{4} where yet another individual is added and so forth until τ_{ n }is reached and the last individual is included; from here the process continues as a standard coalescent process (as no new lineages will be added) until the most recent common ancestor (MRCA) of the sample is reached. Therefore, we refer to this last interval as the termination interval (Figure 1). The lineage number at each sample point is a stochastic variable with a distribution that must be tracked through the sampling interval. It is clear that the following results are all conditional on the temporal configuration of the sample (τ) and consequently this dependence is suppressed from this point.
The mutation process
We shall be concerned with the sampling properties of nucleotide sequences. Hence, we adopt the infinitely many sites model [17], assuming that a single site experiences at most one mutational event so that every mutation that arises is represented in the sample. Furthermore, we assume that mutations are generated by a Poisson process with parameter . The compound parameter θ is given by θ = 2N μ where μ is the mutation rate per sequence per generation.
As branch lengths are measured in units of N generations, the expected number of segregating sites generated over a tree (or sub-section of a tree) with total branch length l, is .
Results
Measures on coalescent trees of time-structured samples
In this section we use the theory presented above to derive recursions describing various measures on coalescent trees of time-structured data.
Number of lineages through the sampling intervals
Let, {A_{ i }(t), 0 ≥ t ≥ τ_{i+1,i}} be a stochastic variable representing the number of lineages in interval i, t time units after τ_{ i }(see Figure 1). Consecutive death processes are related by single birth events, so that
The probability distribution of A_{ i }(t) is found by summing over all the possible lineage numbers at the start of the interval permitted by the sample configuration
Notice that the last term is given by (2), and that Pr{A_{ i }(0) = a}, the probability of observing a lineages at the start of the interval, is given by (5,7) and:
for i = 1, 2
for i > 2
Pr{A_{ i }(0) = a} = Pr{A_{i-1}(τ_{i,i-1}) = a - 1}.
Lastly, let {A_{ i }} be the marginal lineage number in the interval [τ_{ i }, τ_{i+1}) with probability
Time to the most recent common ancestor
From (1) and the above it follows that the expected time to the MRCA of a time structured sample, (T_{ MRCA }), is given by
Total branch length of the genealogy
The above results can also be applied to produce the expected total branch length for time-structured data (B^{(tot)}), given by
where B_{ i }is the branch length added in each of the n - 1 time-intervals that comprise the total sampling interval,
and B^{(term)}) is the branch length added over the last interval
where the latter term corresponds to the expected total branch length of the tree relating a sample of size a in a standard coalescent process [17].
Number of lineages subtending leaves exclusive to a sub-sample
Let the function δ(Q, i) be a membership function for the sub-set Q so that
The joint probability distribution over A_{ i }(t) and (t) is found by summing over all the possible lineage configurations at the start of the interval that are permitted by the sample configuration and the structure of Q
The expected length of branches subtending leaves exclusive to a sub-sample
Let B^{(Q)}denote the expected total length of all branches subtending leaves only in Q. Similar to (10), we have that
which corresponds to conditioning on the number of different lineage types present at the initiation of the termination interval and then weighing the expected branch length over the contemporary coalescent process by the probability of observing a given number of ancestors exclusive to Q in the individual coalescent intervals.
Effect of sampling scheme on the expected number of segregating sites unique to ancient samples
When the objective is to maximise the expected length of branches subtending only ancient lineages the bouquet strategy may intuitively seem the obvious choice as this maximises the total age of ancient material in the sample. However, Figure 3 shows that the bouquet sampling strategy only outperforms the stepwise strategy until a certain value of τ_{ max }is reached. Above this value it is advisable to distribute ancient samples evenly over the maximum attainable time-interval rather than to sample all ancient individuals as old as possible. This may be of particular relevance to species with small effective population sizes (τ_{ max }large), like top-level predators (saber-toothed tigers, lions and bears).
The case of two sample points
In this section we focus on the simpler case with only two sampling points i.e. the bouquet sampling strategy presented above (Figure 2). Our interest is to study the distribution of segregating sites between different classes of lineages. Under the infinite sites model, the number of segregating sites in a contemporary sample of individuals has been extensively studied by [17]. Furthermore, the transient distribution of segregating sites between two time points has been studied by [22].
Suppose that a total of n individuals are sampled. Of these, b ancient individuals are sampled at the same time and constitute the sub-sample Q of interest. The remaining n - b individuals are all extant i.e. sampled at time 0. Assume that there are a ancestors left of the extant individuals at time τ_{ n-b }, i.e. when the ancient individuals included in Q are reached. Since no time passes (τ_{n-b+1}= τ_{n-b+2}= ... = τ_{ n }) during the sampling of the b lineages in Q, we have that (0) = b and (0) = a.
The coalescence process in the terminal interval now proceeds through a + b - 1 intervals of waiting. Let t_{ k }denote the beginning of the k th coalescent interval {k = 0,1...a + b - 2}.
The total number of mutations occurring on the lineages through the coalescent process in the terminal interval (U^{(tot)}) will give rise to segregating sites that can be divided into three categories: mutations which occur on lineages subtending only in Q and which therefore create segregating sites where the mutant is unique to the ancient individuals, (U^{(Q)}), mutations occurring on lineages subtending only in Q^{ c }causing segregating sites where the mutation is unique to the extant individuals (U^{(E)}), and mutations occurring on lineages subtending in both Q and Q^{ c }giving rise to segregating sites where the mutation is shared between the ancient and the extant individuals (U^{(S)}); U^{(tot)}= U^{(E)}+ U^{(S)}+ U^{(Q)}. In the following we assume that the ancestral state of the genetic element under study can be accurately inferred. In the case where a segregating site represents a mutation that is found in all individuals of one group (ancient or extant) and absent from all individuals in the other group, this knowledge is required to determine whether the segregating site represents a mutation that is unique to all ancient (in U^{(Q)}) or to all extant individuals (in U^{(E)}).
Segregating sites found only in ancient individuals
Through the coalescent process, lineages which subtend leaves exclusively in Q (ancient leaves) may die in two events: as two of these lineages coalesce, or as one of these coalesce with a lineage subtending leaves in Q^{ c }. Let there be ( (t_{ k }) = c) ancestors left subtending exclusively ancient leaves at the beginning of the k th coalescent interval.
The probability of losing a lineage subtending leaves exclusively in Q over the k th coalescent interval is
and the probability of keeping a lineage subtending leaves exclusively in Q over the k th coalescent interval is
The probability distribution of the number of mutations unique to the lineages in Q which occur from interval k until the MRCA { } is then given by the recursion
and the full unconditional probability of seeing m mutations is found by summing over all possible start values of a
Segregating sites arising in terminal interval found only in extant individuals
It is evident that the probability distribution of the number of segregating sites where the mutation is unique to extant individuals can be found by exchanging Q and Q^{ c }in the above.
Segregating sites shared between ancient and extant individuals
Given that interval k is initiated with ( (t_{ k }) = s) lineages subtending leaves in both Q and Q^{ c }and ( (t_{ k }) = c) ancestors left subtending exclusively ancient leaves, we have that over the k th interval:
The probability of losing a lineage subtending leaves in both Q and Q^{ c }is
The probability of the number of lineages subtending leaves in both Q and Q^{ c }remaining constant while a lineage subtending leaves exclusively in Q is lost is
The probability of the number of lineages subtending leaves in both Q and Q^{ c }remaining constant whilst the number of lineages subtending leaves exclusively in Q is kept constant is
and the probability of gaining a lineage subtending leaves in both Q and Q^{ c }is
The number of shared mutations which occur from interval k until the MRCA { } is then given by the recursion
and the full unconditional probability of seeing m mutations is found by summing over all possible start values of a
Total number of segregating sites arising in terminal interval
The total number of segregating sites created in the terminal interval can be found by treating all lineages as one sub-sample in either of the recursions given above.
Applications
The effect of sample number under different evolutionary scenarios. Result in this table are based on typical parameter values from the ancient DNA literature and a bouquet sampling scheme (see text). Fixed parameter values are: fragment length = 500 nucleotides, substitution rate per nucleotide site = 1·10^{-7}, number of extant samples = 30. Given these values any combination of the population size parameter (N_{ e }) and the maximal number of generations spanning the sampling interval (g_{ max }) can be translated to a value of the population parameter (θ) and the maximum length of the sampling interval (τ_{ max }) which are measured in units of generations (see text for definition). For various such parameter combinations we here list the associated values of θ, τ_{ max }, and the calculated number of ancient samples needed to have a greater than 95% chance of seeing at least one segregating site where the mutation is unique to the ancient individuals (n_{ u }) and the number of ancient samples needed to have a greater than 95% chance of seeing at least one segregating site where the mutation is shared between ancient and extant individuals (n_{ s }).
N _{ e } | θ | g _{ max } | τ _{ max } | n _{ u } | n _{ s } |
---|---|---|---|---|---|
5000 | 0.5 | 1000 | 0.2 | > 50 | > 50 |
- | - | 5000 | 1.0 | > 50 | > 50 |
- | - | 10000 | 2.0 | > 50 | > 50 |
- | - | 30000 | 6.0 | > 50 | > 50 |
- | - | 50000 | 10.0 | > 50 | > 50 |
10000 | 1.0 | 1000 | 0.1 | > 50 | > 50 |
- | - | 5000 | 0.5 | > 50 | > 50 |
- | - | 10000 | 1.0 | > 50 | > 50 |
- | - | 30000 | 3.0 | 42 | > 50 |
- | - | 50000 | 5.0 | 39 | > 50 |
30000 | 3.0 | 1000 | 0.03 | 44 | 3 |
- | - | 5000 | 0.17 | 23 | 5 |
- | - | 10000 | 0.33 | 15 | 7 |
- | - | 30000 | 1.0 | 9 | > 50 |
- | - | 50000 | 1.67 | 7 | > 50 |
50000 | 5.0 | 1000 | 0.02 | 26 | 2 |
- | - | 5000 | 0.1 | 16 | 2 |
- | - | 10000 | 0.2 | 11 | 3 |
- | - | 30000 | 0.6 | 7 | 6 |
- | - | 50000 | 1.0 | 5 | 9 |
For a specific example consider the recent effort to sequence ancient human mitochondrial sequences from ~ 20.000 years ago [25]. If we assume an effective human population of ~ 10.000 and a generation time of 20 years [20], we have that for humans g_{ max }~ 0.1 at present. Given the population and mutation parameters chosen here, we see from Table 1 that a large number of ancient human samples (> 50) would be needed to ensure the finding of segregating sites where the mutation is unique to ancient humans. It is thus not surprising that the two ancient human mitochondrial haplotypes inferred in a study by Caramelli et al. [25] are both found to be circulating in the present day human population, and it is questionable whether unique ancient haplotypes can be obtained without a substantial elongation of either the sampling interval or the region sequenced.
Discussion
The use of time-structured population samples has a long standing tradition in the study of rapidly evolving microorganisms. Here, the temporal component of data sets has allowed researchers to explore complex hypotheses concerning e.g. the action of selection, host-parasite co-evolution and the evolutionary response of parasites to drugs (see [2] for a review). With the advent of ancient DNA technologies it has recently become possible to obtain time-structured genetic samples that span a large number of generations from multi-cellular organisms also. As an example of the potential that such time-structured samples holds for resolving long standing questions in evolutionary biology, ancient DNA data have been used to clarify the genetic relationship between our own species and the Neanderthals [6, 7], to estimate absolute rates of nucleotide substitution [4] and to infer patterns of population demography in relation to e.g. climatic changes [21].
Inspired by these technological advances, our motivation was to elaborate existing results from population genetics to the case where samples have a time-structure and to relate the results to problems faced by experimentalists.
In the first section of the paper, we have presented recursions for various tree measures and applied these to explore the effect of sampling scheme on the expected number of segregating sites where the mutation is found only in ancient individuals. Besides guiding the choice of sampling strategy, these results allow experimentalists to consider whether any genetic variation previously unseen is likely to be discovered by including ancient samples, and if so, to compare the observed number of segregating sites where the mutation is unique to ancient individuals to the expected number under a neutral model. Other applications may be relevant for the construction of time-structured data sets, depending on the objective. As an example, studies may have the objective of estimating the rate of neutral substitution. For such an estimate, substitutions occurring over the sampling interval constitute information and substitutions occurring over the terminal interval constitute noise. Thus, it is desirable to construct sampling schemes which maximise the ratio of the expected length of branches in the sampling interval over the expected length of branches in the terminal interval, and the effect of e.g. sampling schemes on this measure could also be explored using the results from this paper.
In the second part of the paper we studied the simpler case where samples are taken at only two time-points. This simplification allowed us to present results concerning the full probability distribution of the number of segregating sites where the mutation is unique to the ancient individuals and of the number of segregating sites where the mutation is shared between ancient and extant individuals. As an application of these we have shown the number of ancient lineages needed for a sample to have a high probability of showing at least one segregating site where the mutation is unique to the ancient material or at least one segregating site where the mutation is shared between ancient and extant material. These results should be of interest to experimentalists who wish to evaluate the amount of information likely to be obtained from a given time-structured sample.
In the applications presented we have focused on moderate sample sizes. For large samples (>> 50) computational problems arise due to the computational load of the recursions and problems with computer representation of very small probabilities. If larger sample sizes are to be considered these problems must be circumvented. This could be done by simplification of recursions, the use of approximative calculations and the derivation of probabilities through Monte Carlo procedures [7]. Smaller sample sizes may, however, be sufficient to extract general properties as the effect of increasing sample size on e.g. total branch length in the coalescent quickly diminishes.
The results concerning tree measures are applicable to all time-structured samples. However, the results on segregating sites are derived under the assumption of the infinite sites model which is invalid for data sets taken over long time-intervals from rapidly evolving genetic elements such as the mito-chondrial control region or genomic regions of RNA viruses.
For simplicity we have only considered the case of a constant population, but if the demographic function is known, the effect of varying population size could be accommodated into the tree measure recursions given here via the generalisation of (2) given in [26]. This is also true for the tree measures concerning sub-samples, since the result in (4) is purely combinatorial and thus independent of the demographic function. For the results concerning the number of segregating sites, an extension to varying population size is not possible with the present approach as it relies on the independence between consecutive coalescent intervals.
Conclusion
The focus of this paper has been on sampling issues and the production of results that may be of use in the design and interpretation of experiments including time-structured genetic samples, particularly ancient DNA experiments. However, given that the inclusion of time-structure increases the statistical power in evolutionary inference, it is our hope that the results presented may also be useful in the pursuit of statistical tests, in the flavour of Tajima's D [27], which are applicable to time-structured samples.
Declarations
Acknowledgements
This study acknowledges support from the following grants: The Danish Natural Science Research Council grants 21-02-0206 and 51-00-0392 (RF), The National Institute of Health, USA grant 1-R01-GM60729-01 (RF), and EPSRC grant HAMJW and MRC grant HAMKA. Carsten Wiuf is thanked for useful discussions.
Authors’ Affiliations
References
- Williamson EG, Slatkin M: Using Maximum Likelihood to Estimate Population Size From Temporal Changes in Allele Frequencies. Genetics. 1999, 152 (2): 755-761.PubMed CentralPubMedGoogle Scholar
- Drummond AJ, Pybus OG, Rambaut A, Forsberg R, Rodrigo AG: Measurably evolving populations. Trends in Ecology and Evolution. 2003, 18: 481-488. 10.1016/S0169-5347(03)00216-7.View ArticleGoogle Scholar
- Forsberg R, Oleksiewicz MB, Petersen AM, Hein J, Botner A, Storgaard T: A molecular clock dates the common ancestor of European-type porcine re productive and respiratory syndrome virus at more than 10 years before the emergence of disease. Virology. 2001, 289 (2): 174-179. 10.1006/viro.2001.1102.View ArticlePubMedGoogle Scholar
- Lambert DM, Ritchie PA, Millar CD, Holland B, Drummond AJ, Baroni C: Rates of evolution in ancient DNA from Adelie penguins. Science. 2002, 295 (5563): 2270-2273. 10.1126/science.1068105.View ArticlePubMedGoogle Scholar
- Leonard JA, Wayne RK, Cooper A: Population genetics of ice age brown bears. Proc Natl Acad Sci USA. 2000, 97 (4): 1651-1654. 10.1073/pnas.040453097.PubMed CentralView ArticlePubMedGoogle Scholar
- Krings M, Stone A, Schmitz RW, Krainitzki H, Stoneking M, Paabo S: Neandertal DNA sequences and the origin of modern humans. Cell. 1997, 90: 19-30. 10.1016/S0092-8674(00)80310-4.View ArticlePubMedGoogle Scholar
- Nordborg M: On the probability of Neanderthal ancestry. Am J Hum Genet. 1998, 63 (4): 1237-1240. 10.1086/302052.PubMed CentralView ArticlePubMedGoogle Scholar
- Drummond AJ, Nicholls GK, Rodrigo AG, Solomon W: Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data. Genetics. 2002, 161 (3): 1307-1320.PubMed CentralPubMedGoogle Scholar
- Fu YX: Estimating mutation rate and generation time from longitudinal samples of DNA sequences. Mol Biol Evol. 2001, 18 (4): 620-626.View ArticlePubMedGoogle Scholar
- Rodrigo AG, Felsenstein J: Coalescent approaches to HIV-1 population genetics. The Evolution of HIV. Edited by: Crandall KA. 1999, Baltimore: Johns Hopkins University Press, 233-272.Google Scholar
- Rodrigo AG, Shpaer EG, Delwart EL, Iversen AK, Gallo MV, Brojatsch J, Hirsch MS, Walker BD, Mullins JI: Coalescent estimates of HIV-1 generation time in vivo. Proc Natl Acad Sci USA. 1999, 96 (5): 2187-2191. 10.1073/pnas.96.5.2187.PubMed CentralView ArticlePubMedGoogle Scholar
- Seo TK, Thorne JL, Hasegawa M, Kishino H: Estimation of effective population size of HIV-1 within a host: a pseudomaximum-likelihood approach. Genetics. 2002, 160 (4): 1283-1293.PubMed CentralPubMedGoogle Scholar
- Kingman JFC: On the genealogy of large populations. Journal of Applied Probability. 1982, 19A: 27-43.View ArticleGoogle Scholar
- Kingman JFC: The coalescent. Stock Process Appl. 1982, 13: 235-248. 10.1016/0304-4149(82)90011-4.View ArticleGoogle Scholar
- Fisher RA: The genetical theory of natural selection. 1930, Oxford: Clarendon PressView ArticleGoogle Scholar
- Wright S: Evolution in Mendelian populations. Genetics. 1931, 16: 97-159.PubMed CentralPubMedGoogle Scholar
- Watterson GA: On the number of segregating sites in genetical models without recombination. Theoretical Population Biology. 1975, 7 (2): 256-276. 10.1016/0040-5809(75)90020-9.View ArticlePubMedGoogle Scholar
- Tavaré S: Line-of-descent and genealogical processes, and their applications in population genetics models. Theoretical Population Biology. 1984, 26 (2): 119-164. 10.1016/0040-5809(84)90027-3.View ArticlePubMedGoogle Scholar
- Saunders IW, Tavare S, Watterson GA: On the genealogy of nested subsamples from a haploid population. Advances in Applied Probability. 1984, 14: 471-491.View ArticleGoogle Scholar
- Cooper A, Drummond AJ, Willerslev E: Ancient DNA: would the real Neandertal please stand up?. Curr Biol. 2004, 14 (11): 431-433. 10.1016/j.cub.2004.05.037.View ArticleGoogle Scholar
- Shapiro B, Drummond AJ, Rambaut A, Wilson MC, Matheus PE, Sher AV, Pybus OG, Gilbert MTP, Barnes I, Binladen J, Willerslev E, Hansen AJ, Baryshnikov GF, Burns JA, Davydov S, Driver JC, Froese DG, Harington CR, Keddie G, Kosintsev P, Kunz ML, Martin LD, Stephenson RO, Storer J, Tedford R, Zimov S, Cooper A: Rise and Fall of the Beringian Steppe Bison. Science. 2004, 306 (5701): 1561-1565. 10.1126/science.1101074.View ArticlePubMedGoogle Scholar
- Griffiths RC: Transient distribution of the number of segregating sites in a neutral infinite-sites model with no recombination. Journal of Applied Probability. 1981, 18: 42-51.View ArticleGoogle Scholar
- Hudson RR: Gene genealogies and the coalescent process. Oxford Surveys in evolutionary biology. 1991, 1-44.Google Scholar
- Barnes I, Matheus P, Shapiro B, Jensen D, Cooper A: Dynamics of Pleistocene population extinctions in Beringian brown bears. Science. 2002, 295 (5563): 2267-2270. 10.1126/science.1067814.View ArticlePubMedGoogle Scholar
- Caramelli D, Lalueza-Fox C, Vernesi C, Lari M, Casoli A, Mallegni F, Chiarelli B, Dupanloup I, Bertranpetit J, Barbujani G, Bertorelle G: Evidence for a genetic discontinuity between Neandertals and 24,000-year-old anatomically modern Europeans. PNAS. 2003, 100 (11): 6593-6597. 10.1073/pnas.1130343100.PubMed CentralView ArticlePubMedGoogle Scholar
- Griffiths RC, Tavaré S: The age of a mutation in a general coalescent tree. Stochastic Models. 1998, 14: 273-295.View ArticleGoogle Scholar
- Tajima F: Statistical Method for Testing the Neutral Mutation Hypothesis by DNA Polymorphism. Genetics. 1989, 123 (3): 585-595.PubMed CentralPubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.