Measuring differentiation among populations at different levels of genetic integration
© Gillet and Gregorius; licensee BioMed Central Ltd. 2008
Received: 14 February 2008
Accepted: 30 September 2008
Published: 30 September 2008
Most genetic studies of population differentiation are based on gene-pool frequencies. Population differences for gene associations that show up as deviations from Hardy-Weinberg proportions (homologous association) or gametic disequilibria (non-homologous association) are disregarded. Thus little is known about patterns of population differentiation at higher levels of genetic integration nor the causal forces.
To fill this gap, a conceptual approach to the description and analysis of patterns of genetic differentiation at arbitrary levels of genetic integration (single or multiple loci, varying degrees of ploidy) is introduced. Measurement of differentiation is based on the measure Δ of genetic distance between populations, which is in turn based on an elementary genic difference between individuals at any given level of genetic integration. It is proven that Δ does not decrease when the level of genetic integration is increased, with equality if the gene associations at the higher level follow the same function in both populations (e.g. equal inbreeding coefficients, no association between loci). The pattern of differentiation is described using the matrix of pairwise genetic distances Δ and the differentiation snail based on the symmetric population differentiation Δ SD . A measure of covariation compares patterns between levels. To show the significance of the observed differentiation among possible gene associations, a special permutation analysis is proposed. Applying this approach to published genetic data on oak, the differentiation is found to increase considerably from lower to higher levels of integration, revealing variation in the forms of gene association among populations.
This new approach to the analysis of genetic differentiation among populations demonstrates that the consideration of gene associations within populations adds a new quality to studies on population differentiation that is overlooked when viewing only gene-pools.
Most biological species are subdivided into populations that are more or less strongly connected by gene flow. This facilitates a species' persistence via adaptive differentiation to local conditions, which in turn serves to maintain genetic variation for future adaptational processes. This concept of species is reflected, for example, in meta-population analysis with its special emphasis on extinction-recolonization dynamics (see  for a still relevant review). Genetic control of the phenotypic traits on which processes of adaptation operate is usually complex due to the involvement of several interacting genetic traits that may be expressed even in different developmental phases, including the haplophase. The detection of selectively neutral impacts on population differentiation (e.g. founder effects, genetic drift) may also require the analysis of multiple genetic traits, the interactions among which are determined by chance and in combination with particular mating systems (such as partial selfing). Thus the amount and pattern of genetic differentiation among a set of populations basically depends on:
(1) the developmental stage (chiefly haplophase vs. diplophase),
(2) the genetic traits under consideration at this stage, and
(3) the ways in which the different states of these genetic traits in the populations are associated to form the genetic types (haplotypes, genotypes) at this stage, broadly termed gene association in this paper.
In general, traits are genetic only if they are inheritable, and the goal of inheritance analysis is to identify genes as the basic units of inheritance. The term genetic integration is used here to designate the combination or arrangement of these elementary objects "gene" into the haplotypes of gametes, into the genotypes at diploid (or polyploid) nuclei of diplophase individuals, or into the cytotypes of mitochondria or plastids, for example. Accordingly, each level of genetic integration usually corresponds to a developmental stage or an organelle that is characterized by special combinations of genes. (To emphasize this aspect, genic integration might be the more appropriate term.)
The main motivation for this paper was the realization that impacts of particular forces, selective or not, on population differentiation may not be observable at every level of genetic integration. Measurements of differentiation among populations based on gene frequencies, for example, provide no specific insights into the effects of mating systems nor of epistatic interaction on population differentiation. This is due to the fact that gene frequencies refer to the lowest level of genetic integration, namely its absence. This level, which is commonly addressed as a population's gene-pool, is conceived to consist of the set of all individual genes present in the population members for a specified set of genetic traits. Genetic studies of population differentiation are almost always based on this "beanbag" (critically reflected by Mayr  and defended by Haldane ; for concise reasoning of the persistence of the gene-pool concept see e.g.  or ). Studies of differentiation at multiple loci are no exception, since they commonly report averages over single-locus differentiation indices. Also disregarded in studies of gene-pool differentiation are gene associations that deviate from Hardy-Weinberg proportions (homologous, or intralocus, association) or gametic equilibria (non-homologous, or interlocus, association). Considering that forms and degrees of gene association may differ at different levels of genetic integration, it thus appears that previous studies on patterns of population differentiation have provided very little information on levels of genetic integration above the gene-pool.
One important reason for the usual focus on gene-pool differentiation is probably the lack of a method for measuring population differentiation consistently at all levels of genetic integration. Consistency means that comparison of the amount of differentiation among a set of populations between levels of integration provides information about the complexity of the gene associations that distinguish them. Since gene associations do not decrease as level of integration increases, neither should differentiation. Moreover, the extent of an increase in differentiation between subsequent levels should in some way reflect the degree of complexity of the additional gene associations, with equality as an indication of lack of additional complexity by some standard. Such a differentiation measure must thus be based on a conceptual characterization of the complexity of gene associations.
The existence of such a measure would not only facilitate experimental studies but also simplify the development and testing of models. Insights can be gained from models only when the characteristics described by the models derive from concepts that are conceived independently of the models. Thus models do not serve to analyze characteristics: characteristics serve to analyze models. Moreover, model-based analysis that is limited to falsification of a particular model or its parameterization provides no information on the validity of related models. A conceptually argued measure, in contrast, can be applied to whole classes of models. This permits summarization of characteristics they have in common, the statistical significance of which can be tested by permutation analysis.
In the present paper, a new approach to differentiation analysis is presented that applies a conceptually argued measure of differentiation Δ SD to analyze and compare differentiation patterns among populations at different levels of integration. Presentation includes the development of Δ SD , representation of patterns of differentiation, and tests of significance of the patterns. Comparison of differentiation between levels of integration is analyzed mathematically. The method's usefulness is demonstrated by applying it to six-locus microsatellite data from four stands of pedunculate oak (Quercus robur). The purpose of using real data is to show how insights can be gained directly from observations without limitation to particular models, the testability of which may be difficult. It turned out that the large increases in differentiation between levels that were observed in the real data were not producible in numerous simulations of simple selection models, indicating that these models cannot explain the complexity of the real data. Studies of the behavior of this measure using simulated data from increasingly complex models will be the subject of a future paper.
To prevent possible misunderstanding, it should be mentioned that this approach differs in content from any type of (hierarchical) partitioning, apportionment, or allocation of genetic variation (such as within and between populations). Methods of attributing overall variation to partitions draw upon the principle of the analysis of variance and were extended to include more general measures of difference between individuals by Rao (equation 2.3.1 in ). An application of this generalization to a special measure of genetic difference for multiple loci between haplotypes led Excoffier et al.  to the formulation of their "analysis of molecular variance". In contrast, the levels of genetic integration dealt with here cannot serve as classes (partitions) over which genetic variation is distributed. Instead, at each integration level (e.g. gene pool, single-locus genotypes, multilocus genotypes) the genetic characteristics can be analyzed for their differentiation within population subdivisions. Subsequent comparison between levels reveals which level of integration, and thus which type of gene association (especially homologous vs. non-homologous), has the greatest influence on the differentiation within the partition.
Levels of genetic integration and gene association
At the lowest level of genetic integration, the gene-pool, the gene-type of each individual gene is characterized by the gene locus at which it is located and by its allelic state. Assuming that the degree of ploidy is the same at all loci, the relative frequencies of the gene-types in the gene-pool of a population equal ·pi;l, where L is the number of loci and pi;lis the relative frequency of the i-th allele at the l-th gene locus in the population (∑ i pi;l= 1, ). If loci of differing degree of ploidy (e.g. nuclear and organelle) are included in the analysis, replace with the locus-specific quantities r l obtained by division of the degree of ploidy at the l-th locus by the sum of the degrees over all loci. The gene-pool frequency of the gene-type specified by the i-th allele at the l-th locus then equals r l ·pi;l. At higher levels of genetic integration, where the objects of interest represent compositions of several individual genes together with their gene-types, association among gene-types becomes relevant for differentiation studies. If the objects are diplophase individuals and if the gene-types are specified at a single gene-locus, then all associations among the genes that make up the genotypes are homologous (i.e., allelic) by definition. When multiple loci are considered, both homologous and non-homologous (interlocus) associations exist among genes. If the objects are haplophases, each object having just one gene per locus, then all gene associations are non-homologous. Since at any given locus all objects carry the number of (allelic) individual genes specified by the degree of ploidy of the locus, the objects representing a given level of genetic integration are characterized by the same number of individual genes.
The elementary genic difference
From this perspective, genetic differences between two objects of the same level of integration are basically determined by the number of their individual genes that differ in type. If the numbers of copies of the i-th allele at the l-th gene locus are denoted by ni;land mi;l, respectively, then the two objects differ by ∑i,l|ni;l- mi;l| gene-type copies. This sum is maximal, equaling two times the total number K of individual genes represented in each object, if the objects share no gene-types (and thus differ completely). Since ∑i,lni;l= ∑i,lmi;l= K holds, division of ∑i,l|ni;l- mi;l| by 2·K yields a measure of genic difference that is bounded between zero and one. This measure of elementary genic difference is applicable to all levels of integration. It differs from a closely related index suggested by Smouse and Peakall  in a different context, in which the absolute difference is replaced by the squared difference, a disadvantage of which is that objects sharing no gene-types need not realize the maximum difference.
The elementary genic difference does not distinguish homologous from non-homologous genes. Hence, the homologous and non-homologous gene arrangements within the objects affect the elementary genic differences between them only through their sum. For example, in the case of diploid individuals scored at two gene loci A and B, say, the genotypes A1A1/B1B2 and A1A2/B1B3 represent three (A1, B1, B2) and four (A1, A2, B1, B3), respectively, of the total of five gene-types. A1 is represented by two copies in the first genotype and by one copy in the second, and the remaining four gene-types are represented by at most one copy in each of the two genotypes. The sum of copy number differences between the two genotypes thus equals four. After division by twice the number of individual genes in a genotype (i.e. 2·4), this yields 0.5 as the elementary genic difference. The same result is obtained for the two genotypes A1A2/B1B2 and A1A2/B3B3, even though all genic differences are now due to the alleles at a single locus (B).
These considerations show that objects representing higher levels of genetic integration are not simply of the same or different genetic type, as is the case at the level of the gene-pool. Specification of the gene-types of which the genetic types are composed yields a measure of the differences between them that ensures the comparability of genetic differences even across levels of genetic integration. Thus, analysis of population differentiation at higher levels of integration should take into account not only differences in the frequencies of the genetic types among populations but also the variation in the pairwise differences between types.
The measure Δ of genetic distance between two populations
In  and  it is shown that finding a shift transformation s that minimizes Δ(s) is equivalent to solving the "Transportation Problem"  by linear programming methods. These methods are implemented in the computer program DeltaS .
In this expression, d0(p(l), q(l)) is a familiar measure of genetic distance between two populations with allele frequencies p(l)and q(l)at locus l (see e.g. ). It turns out that the gene-pool distance between two populations equals the average distance over the single loci.
At the diplophase level of integration, for example, consider two populations and with Hardy-Weinberg proportions (HWP) for the two alleles A1 and A2 at a locus. Let p1 > q1, and let have more heterozygotes than . Then there is only one way s of shifting, namely s(A1A1, A2A2) = > 0 and s(A1A2, A2A2) = 2p1p2 - 2q1q2 > 0. Since for the elementary genic distance, d(A1A1, A2A2) = 1.0 and d(A1A2, A2A2) = 0.5, the genetic distance equals Δ = 1.0·() + 0.5·(2p1p2 - 2q1q2) = p1 - q1. In this example, the distance at the diplophase level equals the gene-pool distance. Under Results it is shown (Proposition 1) that the diplophase distance is never less than the gene-pool distance and that equality at the two levels is of particular interest.
Patterns of differentiation among populations
At this point, each level of integration for a set of populations is characterized by a matrix of pairwise distances Δ between the populations. These matrices and the relationships among them can be called the pattern of differentiation among the populations. Three approaches to the description of differentiation patterns are discussed.
Matrices of pairwise genetic distances between populations are commonly represented using clustering methods as dendrograms, the topologies (cluster structures) of which are of primary interest. In particular, the emergence of new cluster structures at higher levels of integration emphasizes the necessity to consider evolutionary forces of population differentiation that go beyond those conventionally held responsible for gene-pool differentiation. Detection of such structures of course depends on comparison of the dendrograms from different levels of integration, where the gene-pool constitutes the basic reference for comparison. There are many ways of comparing dendrograms obtained with the same clustering method (for an overview see e.g. , p. 94ff). We will concentrate instead on direct comparison of the quantities underlying all methods of clustering, i.e., the matrix of pairwise distances. Changes in topology are most likely to occur when the distance matrices show poor correspondence across levels of integration, that is, low covariation (see below).
Another common approach is less detailed and essentially rests on the computation of a single statistic of the degree of differentiation among populations. Among these measures, most of which are indexed by ST , the classical versions F ST  and G ST  consider population differentiation solely for allele frequencies. More recent versions such as Φ ST  or R ST  include variable differences between genetic types. Inferences on patterns of differentiation are more or less restricted to ways in which an observed amount of differentiation could have evolved under certain model assumptions. Moreover, the whole family of ST -measures is based on the principle of variance decomposition, where the difference between the total variation and the average variation within populations is divided by the total variation. Such measures do not assume their maximum values only for completely differentiated populations. This follows directly from their conceptual underpinning, which refers to partitioning rather than differentiation of genetic variation among populations. The ST -measures therefore have limited relevance as indicators of patterns of differentiation among populations.
Symmetric population differentiation Δ SD
Whereas Δ SD quantifies the average degree to which individual populations differ from their complements, its components Δ j identify individual populations as being more or less representative of the whole collection of populations. Thus, Δ j = 0 summarizes the situation where the j-th population perfectly represents the totality of the populations. On the other hand, the more distinctly Δ j exceeds Δ SD , the more a population is distinguished from all the others. The extreme of complete differentiation of course requires a definite notion of complete difference between types (as is the case with binary difference measures as well as with the measure d of elementary genic difference).
The differentiation pattern inherent in Δ SD and its components Δ j for variable population sizes can be illustrated as a "differentiation snail"  (see Fig. 2 below). The snail complements the pattern characteristics obtainable from clustering methods or directly from the distance matrix in that it reveals tendencies of population assemblages to be genetically dispersed or to concentrate genetic variation in a few populations. In order to assess changes in the snail between levels of genetic integration, the following measure of covariation of the respective components Δ j can be applied.
Covariation of differentiation between integration levels
where the variables X i and Y i refer to genetic distances at two different levels of integration. In the case of the distances between a population and its complement, X i and Y i refer to Δ i at the two levels of integration. In the case of pairwise distances between populations, X i and Y i refer to the i-th element of the distance matrix for each of the two levels of integration. C varies between -1 and +1 such that C = 1 for strictly positive and C = -1 for strictly negative covariation. It is undefined in the practically irrelevant case where a non-zero difference for one variable implies equality for the other.
Permutation test of the significance of genetic differentiation patterns
Any increase of genetic differentiation among populations at higher levels of genetic integration is due to forces of association of genes that differ among populations. It is thus of basic interest to know whether the differentiation observed at a level of integration can be explained by random combination of genes (e.g. into diploid genotypes or haplotypes) or whether directed forces of combination must be assumed. This requires an analysis that is conditional on the gene-pool of each population, the number of populations, and the population sizes. The effects of chance can be assessed by permuting the genes within each population, such that all homologous and non-homologous combinations of genes (alleles) into (haploid, diploid or polyploid) genotypes have equal probability. For each such permutation, the values of all relevant descriptors (e.g. covariation C for distance matrices and differentiation snails, the mean pairwise distance Δ in the distance matrix, the symmetric population differentiation Δ SD ) are determined. By performing a large number of permutations, the significance of each observed descriptor value can be measured in terms of the P-value, which is the proportion of permutations yielding descriptor values greater than or equal to the observed value. For interpretation of the results, both very small P-values (≤ 0.05) and very large P-values (≥ 0.95) are of interest.
This permutation analysis differs from common permutation analyses of differentiation among populations, in which the individuals (together with their fixed genotypes) are permuted over the populations. Such analyses aim to explain gene-pool differences among populations. In contrast, the present paper is targeted at forces of genetic differentiation that originate from the association of genes in diplo- or haplo-states and that thus go beyond those responsible for gene-pool differentiation.
Results and discussion
Effects of level of genetic integration on the pattern of differentiation among populations
Proceeding from lower to higher levels of integration, one expects an increase in differentiation among populations simply because of the larger varietal potential inherent in more complex structures. Since differentiation is based on distances, the distance between two populations should therefore also increase, or at least not decrease, with integration level. Consider two populations and , and denote the relative frequencies of their (multilocus) genotypes at L (≥ 1) loci of equal degree of ploidy (≥ 1) by frequency vectors P and Q and the relative frequencies of the gene-types in their gene-pools by frequency vectors p and q. Proof of the following Theorem requires the special properties of the elementary genic difference between genotypes, including the fact that it is a metric distance:
where the difference between genetic types (haplotypes, diplotypes) is measured by the elementary genic difference d.
Proof: The equality results from definition of Δ and gene-pool. The first inequality follows from Proposition 1 (see Appendix A), which states that the distance Δ between L-locus genotypic structures P and Q(L ≥ 1) is never less than between the gene-pools p and q. From this it follows that Δ(p(l), q(l)) ≤ Δ(P(l), Q(l)) for each locus l. The second inequality stems from Proposition 2 (see Appendix B), which states that the distance Δ between multilocus genotypic structures P and Q is never less than the average of the distances between the corresponding single-locus genotypic structures P(l)and Q(l). ■
We investigated this Theorem by simulating numerous simple models. When we analyzed two populations with differing gene-pools at a locus but both showing HWP among the genotypes, we were surprised to see that the inequalities became equalities. Furthermore, the extension of HWP to inbreeding structures for the same inbreeding coefficient F (i.e., P ii = p i 2 + Fp i (1 - p i ) and P ij = 2p i p j (1 - F)) also yielded equality (F = 0 gives HWP). Equality also held when each of the genotypic structures was the product of two allelic structures (e.g., maternal and paternal), one of which was the same in both populations. When we simulated the frequencies of two-locus genotypes in two populations, both showing HWP at both loci, as the product of the single-locus genotype frequencies, equality again held. In contrast, differentiation between the gene-pool and the genotypes at a single locus did increase for inbreeding structures when the two inbreeding coefficients differed and for product structures when no two of the four allelic structures matched. No increase was obtainable between the average single-locus genotypic distance and the multilocus distance in the case of two loci, each with two alleles, not even when the selection regimes differed between the populations. It is therefore interesting that examples using real data, one of which is presented below, all showed large increases between levels, indicating that the real data does not follow simple models.
As an explanation for the examples in which the genetic distance does not increase with level of genetic integration, consider that the first inequality becomes an equality, if Δ(p(l), q(l)) = Δ(P(l), Q(l)) holds for each single locus l. The calculated examples suggest that equality holds at a single locus if the genotypic structures in both populations result from the same function of their allelic structures, i.e., uniformity of homologous association. The second inequality became an equality in our calculated examples whenever multilocus genotype frequencies were the product of single-locus genotype frequencies, i.e., in the absence of non-homologous association.
These observations suggest that uniformity of homologous association and absence of non-homologous association result in equal distances at different integration levels. Intuitively, this coincides with the conception that absence or uniformity of association do not really introduce any new structure to the higher levels of integration. Since this phenomenon only shows up when the difference between genotypes is measured by the elementary genic distance, this measure is closely tied to the concept that the absence of association does not lead to higher differentiation at higher levels of genetic integration.
Nevertheless, absence of non-homologous association may not be a necessary condition for equality, since also occurred in some examples where association between loci was present. This means that the basic prerequisite for validity of Δ(p, q) = Δ(P, Q) (stated at the end of Appendix A), namely that every gene-type that is not of equal frequency in the two populations be either a source gene or a sink gene, may be fulfilled even in the presence of non-homologous association.
Carrying these results for Δ over to the differentiation measures Δ j and Δ SD , the differentiation among populations for multilocus genetic types (haplotypes, genotypes) equals the gene-pool differentiation if all populations show uniformity of homologous gene association (e.g. HWP, inbreeding for the same inbreeding coefficient) and absence of non-homologous association. Otherwise, differentiation may increase with level of integration, as expected.
All of these results are based on the special measure of elementary genic difference between genotypes (for any degree of ploidy). Thus any other measure is likely to yield different results, the interpretation of which would of course depend on a clear conceptual understanding of the difference measure. In particular, this concerns genetic associations that are not specifically genic. A discussion of these measures (see  for an overview of measures) would, however, be clearly beyond the scope of this paper.
Application of the approach to an assemblage of oak stands
The effects of the level of genetic integration on patterns of differentiation will be illustrated with the help of an example based on published data [20, 22]. The reason for not applying it to particular models here is to show how insights can be gained directly from observations, without model constraints. In this data, the multilocus genotypes at the same six nuclear microsatellite loci were scored in all adult trees of four stands of pedunculate oak (Quercus robur) located in north-central Germany. Of the 159 trees in the stand near Rantzau, 154 trees could be scored at all six loci, yielding 153 different multilocus genotypes (abbreviated 159/154/153). The other three stands are near Behlendorf (228/178/177), Steinhorst (85/74/74), and Escherode (210/200/200). The number of alleles per locus lies between 15 and 35 with a mean of 23.7, of which an average of five occur in only one stand. Each multilocus genotype appeared in only one stand, yielding a total of 604 different genotypes among the 606 trees scored at all loci. Failure to score the complete multilocus genotypes of the other 76 trees in the stands is assumed to be independent of their genotypes.
Genetic differentiation among four oak stands at three levels of genetic integration.
Genetic differentiation among stands for three levels of integration
Level of integration
Genetic distance between stands
Components of the differentiation snail
Single-locus diplophase (SLD)
0.232 [0.214, 0.235] 0.004 ↑ **
0.200 [0.184, 0.203] 0.002 ↑ **
Multilocus diplophase (MLD)
[0.507, 0.521] 0.005 ↑ **
0.502 [0.489, 0.505] 0.006 ↑ **
Covariation of genetic differentiation between integration levels
GP vs. SLD
[0.421, 0.988] 0.270 n.s.
0.809 [0.545, 1.000] 0.912 n.s.
SLD vs. MLD
1.000 [0.742, 1.000] 0.084 n.s.
0.995 [0.868, 1.000] 0.532 n.s.
GP vs. MLD
0.720 [0.395, 0.954]
0.657 [0.376, 1.000] 0.965 ↓ *
In order to be sure that this apparent discrepancy between stands in the form of association is not simply due to the small number of multilocus genotypes in the stands compared to the number that could be formed from the genes present in the stands, a permutation analysis was performed as described above. Ten thousand new data sets were generated by random permutation of the genes at each locus within each stand to form new single-locus genotypes, randomly combined to multilocus genotypes. Each observed distance was then compared to the 10 000 distances from permutation. Surprisingly, for both the single-locus diplophase and the multilocus diplophase, the observed mean pairwise distance and the symmetric population differentiation Δ SD were significantly high (i.e., higher than for 99% of all permutations). This indicates that both homologous and non-homologous association of genes follow very different rules among the stands.
The significant size of the mean of the pairwise distances for the single-locus diplophase and the multilocus diplophase may seem counterintuitive to the striking similarity of these distances within each of the three levels of integration. The same holds for the snail components. To explain this similarity, note that the range of values that appeared in the permutations is also quite narrow. Thus the collections of genes in the stands must place tight limits on the achievable distances and snail components.
Not only the sizes but also the covariation C of the pairwise distances Δ and the snail components Δ j at the different integration levels depend on the differences in gene association between levels. The positive covariation of distance matrices and of snail components for all pairs of integration levels shows that no form of association completely overturns the ranking prescribed by the gene-pool. Whereas the gene arrangements that distinguish the single-locus diplophase from the gene-pool do produce rank changes among the stands (C = 0.893 for the distance matrix and C = 0.809 for the snail components), the gene arrangements that distinguish the single-locus diplophase from the multilocus diplophase have little effect on ranking (C = 1 for the distance matrix and C = 0.995 for the snail components). Not surprisingly, the gene arrangements that distinguish the gene-pool from the multilocus diplophase yield the weakest covariation (C = 0.720 for the distance matrix and C = 0.657 for the snail components).
It is interesting to compare the observed covariations with the ranges of covariation that occurred for the gene arrangements generated by the 10 000 random permutations. The distance matrices show weaker covariation between the single-locus diplophase and the multilocus diplophase in almost 92% of the permutations (P-value 0.084 for C = 1) but between the gene-pool and the single-locus diplophase for only 73% (P-value 0.270 for C = 0.893). From the high improbability of the observed perfect covariation (C = 1) between the single-locus diplophase and the multilocus diplophase, it can be inferred that the non-homologous association has a special relationship to the homologous association in the single-locus diplophase. In contrast, the intermediate P-value for the covariation between the gene-pool and the single-locus diplophase implies that the homologous association is not predetermined by the collection of genes.
The snail components showed a weaker covariation between the single-locus diplophase and the multilocus diplophase for ca. 47% of the permutations (P-value 0.532 for C = 0.995) but between the gene-pool and the single-locus diplophase only for ca. 9% (P-value 0.912 for C = 0.809). This confirms the stronger effect of homologous association than non-homologous association on the ranking within the distance matrices. Compared to these, however, the snail components show stronger covariation than observed for a much higher proportion of the permutations, both for homologous and non-homologous association. Hence, the covariation of the snail components seems to be less sensitive to the effects of gene association than is the covariation of the pairwise distances. This must be due to the equalizing influence of combining three stands for comparison to the fourth that is the basis of the snail components.
Discussion of the application to the oak stands
The differentiation observed among the oak stands increases distinctly from the gene-pool level to the single-locus diplophase. An even larger jump in differentiation occurs when the non-homologous association for the multiple loci is included. These are clear indications that all (except for perhaps one) of the stands show deviation from both HWP and gametic equilibrium, and that the degrees of deviation vary considerably among the stands. Such indications could not be confirmed by conventional statistical testing due to the large numbers of degrees of freedom and the implied weakness of the respective test statistics. It might come as a surprise that the application of the special permutation analysis presented above to genetic differences between populations detects association characteristics within populations. Confirmation and exploitation of this statistical potential deserves further investigation.
Consequently, if the four oak stands had been less clearly separated spatially, and if we had wanted to assign the trees to their proper subpopulations, we would have run into problems when making use of methods based on the absence of gene associations within populations. Methods for finding subdivisions of populations that are based on Hardy-Weinberg proportions and gametic equilibrium within populations (e.g. [23–27]) may therefore not have assigned the individuals to their original stands.
When comparing the observed differentiation to that producible by gene association in the stands, all 10 000 permutations agreed with the observation by showing much higher differentiation among the single-locus diplophases than among the gene-pools, both for the mean pairwise genetic distance and the symmetric population differentiation Δ SD . This tells us not only that the random generation of gene association never yielded Hardy-Weinberg structures for all loci in all four stands simultaneously. Neither was any other form of homologous association realized simultaneously that leaves differentiation unchanged (e.g. inbreeding with equal coefficients). Furthermore, all non-homologous associations showed a considerable additional increase in differentiation over the homologous associations, as is seen in the wide separation of the range of differentiation for the single-locus diplophase from the range for the multilocus diplophase. Remarkably, both ranges of differentiation are quite narrow. These results indicate that the increases in differentiation that are realizable by homologous and non-homologous gene association can be tightly restricted by the genic composition of the populations. In such cases, equal differentiation at consecutive integration levels may not be achievable. Thus it appears that differentiation among populations with respect to their forms of gene association may be a normal occurrence. This insight questions the common practice of restricting the measurement of population differentiation to the allelic level (e.g. F ST ), thereby ignoring the considerable effects of gene association on population differentiation. This analysis is the first of its kind. Therefore, we cannot venture a prediction about whether the above findings on covariation between levels of integration constitute a general trend. It is conceivable, for example, that these findings are mainly determined by the conspicuously large polymorphism characteristic of the microsatellite markers used in this study. Other genetic markers may tell different stories.
This new approach to the analysis of genetic differentiation among populations demonstrates that the consideration of gene associations within populations adds a new quality to studies on population differentiation that is overlooked when viewing only gene-pools.
where the difference between genetic types (haplotypes, diplotypes) is measured by the elementary genic difference d.
where: s(G x , G y ) is the relative frequency among all individuals in population of individuals that are shifted from type G x to type G y .
and where ni;l(G x ) is the number of genes of allelic type Ai;lin type G x .
Note that s(G x , G y ) > 0 is true only if G x is a source type and G y a sink type. Thus α(Ai;l, •) quantifies the total number of Ai;l-genes in the original (source) types of all shifted individuals, divided by the total number of genes at locus l in Population (= N· population size). Analogously, α(•, Ai;l) quantifies the number of Ai;l-genes in the new (sink) types of all shifted individuals, divided by the same total number of genes. Their difference is the net frequency with which this allele was shifted.
The final equality follows from the definition of d(G x , G y ) in the text. Since this holds for any shift transformation, it also holds if s(P, Q) is a minimum shift transformation, in which case ∑x,yd(G x , G y )·s(G x , G y ) = Δ(P, Q). Therefore, it follows that: , as claimed. ■
In Proposition 1, equality holds if and only if for each gene-type Ai;l, the expression
(ni;l(G x ) - ni;l(G y ))·s(G x , G y )
has the same sign for all pairs of types G x , G y . This distinguishes three special groups of genes: Genes Ai;lfor which the expression equals zero for all pairs of types G x , G y , implying that Ai;lis equally frequent in the two populations and therefore shows no net shift; genes Ai;lfor which the expression is ≥ 0 but not ≡ 0 for all x, y, that is, that are never less frequent in source types G x than in the corresponding sink types G y , making them source genes; genes Ai;lfor which the expression is ≤ 0 but not ≡ 0 for all x, y, making them sink genes. (Note that a gene need not belong to any of the three groups, as is demonstrated by s(Ai;lAj;l, Aj;lAj;l) > 0 and s(Ai;lAj;l, Ai;lAi;l) > 0.)
where the difference between genetic types is measured by the elementary genic difference d.
as the marginal sum of all shifts that involve the type at locus l in the source type G x and in the sink type G y .
Lemma 2 The difference between the marginal sums for any u equals the net shift for any shift transformation s l at the locus.
Even though marginal sums share this property with any shift transformation at the locus, the following lemma shows that marginal sums may not specify a shift transformation.
Lemma 3: The marginal sums of all types , at locus l may shift an amount that is in excess of the amount required of any shift transformation at the locus.
These inequalities contradict the equality required of a shift transformation. ■
The following lemma shows how to eliminate all ambivalent source/sink relationships from the marginal sums without changing the net amount shifted, i.e., amount sent away as a source minus the amount received as a sink.
Proof by construction: Consider the following algorithm:
START: Set for all u, v.
Step 1: If holds for a type , set . Since , this has no effect on the sum . Repeat for an additional type fulfilling the condition. If none exist, go to Step 2.
Repeat for an additional pair of types that fulfill the condition. If none exist, go to Step 3.
If , go to Step 2. Otherwise, repeat Step 3 for another triplet of types fulfilling the condition. If none exists, STOP.
After completion, either or or both hold for all u, meaning that no type is both a source and a sink. The net quasi-shift for each u remains constant throughout the algorithm, equaling by Lemma 2. Thus the quasi-shifts κ l (, ) fulfill the properties, as claimed. ■
With the help of the lemmata, Proposition 2 can now be proven:
Equality holds in Proposition 2 whenever the marginal sums for each locus l = 1,...,L specify a minimal shift transformation, i.e., when .
The authors gratefully acknowledge the comments of two anonymous reviewers which helped considerably in improving the presentation of our concepts. This work was partially funded by grant Zi 662/5-1 from the Deutsche Forschungsgemeinschaft.
- Hanski I: Metapopulation dynamics. Nature. 1998, 396: 41-49. 10.1038/23876.View ArticleGoogle Scholar
- Mayr E: Where Are We?. Cold Spring Harbor Symposia on Quantitative Biology. 1959, 24: 1-14.View ArticleGoogle Scholar
- Haldane JBS: A defence of beanbag genetics. Perspectives in in Biology and Medicine. 1964, 7: 343-359.View ArticleGoogle Scholar
- de Winter W: The Beanbag Genetics Controversy: Towards a synthesis of opposing views of natural selection. Biology and Philosophy. 1997, 12: 149-184. 10.1023/A:1006590002756.View ArticleGoogle Scholar
- Crow JF: The beanbag lives on. Nature. 2001, 409: 771-10.1038/35057409.View ArticlePubMedGoogle Scholar
- Rao CR: Diversity and dissimilarity coefficients: a unified approach. Theoretical Population Biology. 1982, 21: 24-43. 10.1016/0040-5809(82)90004-1.View ArticleGoogle Scholar
- Excoffier L, Smouse PE, Quattro JM: Analysis of molecular variance inferred from metric distances among DNA haplotypes: Application to human mitochondrial DNA restriction data. Genetics. 1992, 131: 479-491.PubMed CentralPubMedGoogle Scholar
- Smouse PE, Peakall R: Spatial autocorrelation analysis of individual multiallele and multilocus genetic structure. Heredity. 1999, 82: 561-573. 10.1038/sj.hdy.6885180.View ArticlePubMedGoogle Scholar
- Gregorius H-R, Gillet EM, Ziehe M: Measuring differences of trait distributions between populations. Biometrical Journal. 2003, 45: 959-973. 10.1002/bimj.200390063.View ArticleGoogle Scholar
- Gillet EM, Gregorius H-R, Ziehe M: May inclusion of trait differences in genetic cluster analysis alter our views?. Forest Ecology and Management. 2004, 197: 149-158. 10.1016/j.foreco.2004.05.010.View ArticleGoogle Scholar
- Hitchcock FL: Distribution of a product from several sources to numerous localities. Journal of Mathematical Physics. 1941, 20: 224-230.View ArticleGoogle Scholar
- Gillet EM: DeltaS, a program to calculate the measure of pairwise distance Δ between populations. [http://www.uni-goettingen.de/de/95605.html]
- Gregorius H-R: Genetischer Abstand zwischen Populationen. I. Zur Konzeption der genetischen Abstandsmessung. Silvae Genetica. 1974, 23: 22-27. [http://www.bfafh.de/inst2/sg-pdf/23_1-3_22.pdf]Google Scholar
- Gordon AD: Hierarchical classification. Clustering and Classification. Edited by: Arabie P, Hubert LJ, Soete GD. 1996, Singapore etc.: World Scientific, 65-121.View ArticleGoogle Scholar
- Wright S: Evolution and the Genetics of Populations. 1969, Chicago: University of Chicago Press, 2:Google Scholar
- Nei M: Analysis of gene diversity in subdivided populations. Proceedings of the National Academy of Sciences USA. 1973, 70: 3321-3323. 10.1073/pnas.70.12.3321.View ArticleGoogle Scholar
- Slatkin M: A measure of population subdivision based on microsatellite allele frequencies. Genetics. 1995, 139: 457-462.PubMed CentralPubMedGoogle Scholar
- Gregorius H-R, Roberds JH: Measurement of genetical differentiation among subpopulations. Theoretical and Applied Genetics. 1986, 71: 826-834. 10.1007/BF00276425.View ArticlePubMedGoogle Scholar
- Gregorius H-R: Differentiation between populations and its measurement. Acta Biotheoretica. 1996, 44: 23-36.View ArticleGoogle Scholar
- Gregorius H-R, Degen B, König A: Problems in the analysis of genetic differentiation among populations – a case study in Quercus robur. Silvae Genetica. 2007, 56: 190-199. [http://www.bfafh.de/inst2/sg-pdf/56_3-4_190.pdf]Google Scholar
- Hubálek Z: Coefficients of association and similarity, based on binary (presence-absence) data: an evaluation. Biological Reviews. 1982, 57: 669-689. 10.1111/j.1469-185X.1982.tb00376.x.View ArticleGoogle Scholar
- Degen B, Streiff R, Ziegenhagen B: Comparative study of genetic variation and differentiation of two pedunculate oak (Quercus robur) stands using microsatellite and allozyme loci. Heredity. 1999, 83: 597-603. 10.1038/sj.hdy.6886220.View ArticlePubMedGoogle Scholar
- Pritchard JK, Stephens M, Donnelly P: Inference of population structure using multilocus genotype data. Genetics. 2000, 155: 945-959.PubMed CentralPubMedGoogle Scholar
- Falush D, Stephens M, Pritchard JK: Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics. 2003, 164: 1567-1587.PubMed CentralPubMedGoogle Scholar
- Corander J, Waldmann P, Sillanpää MJ: Bayesian analysis of genetic differentiation between populations. Genetics. 2003, 163: 367-374.PubMed CentralPubMedGoogle Scholar
- Holsinger KE, Wallace LE: Bayesian approaches for the analysis of population genetic structure: an example from Platanthera leucophaea (Orchidaceae). Molecular Ecology. 2004, 13: 887-894. 10.1111/j.1365-294X.2004.02052.x.View ArticlePubMedGoogle Scholar
- Guillot G, Estoup A, Mortier F, Cosson JF: A spatial statistical model for landscape genetics. Genetics. 2005, 170: 1261-1280. 10.1534/genetics.104.033803.PubMed CentralView ArticlePubMedGoogle Scholar