Measuring differentiation among populations at different levels of genetic integration

Background Most genetic studies of population differentiation are based on gene-pool frequencies. Population differences for gene associations that show up as deviations from Hardy-Weinberg proportions (homologous association) or gametic disequilibria (non-homologous association) are disregarded. Thus little is known about patterns of population differentiation at higher levels of genetic integration nor the causal forces. Results To fill this gap, a conceptual approach to the description and analysis of patterns of genetic differentiation at arbitrary levels of genetic integration (single or multiple loci, varying degrees of ploidy) is introduced. Measurement of differentiation is based on the measure Δ of genetic distance between populations, which is in turn based on an elementary genic difference between individuals at any given level of genetic integration. It is proven that Δ does not decrease when the level of genetic integration is increased, with equality if the gene associations at the higher level follow the same function in both populations (e.g. equal inbreeding coefficients, no association between loci). The pattern of differentiation is described using the matrix of pairwise genetic distances Δ and the differentiation snail based on the symmetric population differentiation ΔSD. A measure of covariation compares patterns between levels. To show the significance of the observed differentiation among possible gene associations, a special permutation analysis is proposed. Applying this approach to published genetic data on oak, the differentiation is found to increase considerably from lower to higher levels of integration, revealing variation in the forms of gene association among populations. Conclusion This new approach to the analysis of genetic differentiation among populations demonstrates that the consideration of gene associations within populations adds a new quality to studies on population differentiation that is overlooked when viewing only gene-pools.


Background
Most biological species are subdivided into populations that are more or less strongly connected by gene flow. This facilitates a species' persistence via adaptive differentiation to local conditions, which in turn serves to maintain genetic variation for future adaptational processes. This concept of species is reflected, for example, in meta-population analysis with its special emphasis on extinction-recolonization dynamics (see [1] for a still relevant review). Genetic control of the phenotypic traits on which proc-esses of adaptation operate is usually complex due to the involvement of several interacting genetic traits that may be expressed even in different developmental phases, including the haplophase. The detection of selectively neutral impacts on population differentiation (e.g. founder effects, genetic drift) may also require the analysis of multiple genetic traits, the interactions among which are determined by chance and in combination with particular mating systems (such as partial selfing). Thus the amount and pattern of genetic differentiation among a set of populations basically depends on: (1) the developmental stage (chiefly haplophase vs. diplophase), (2) the genetic traits under consideration at this stage, and (3) the ways in which the different states of these genetic traits in the populations are associated to form the genetic types (haplotypes, genotypes) at this stage, broadly termed gene association in this paper.
In general, traits are genetic only if they are inheritable, and the goal of inheritance analysis is to identify genes as the basic units of inheritance. The term genetic integration is used here to designate the combination or arrangement of these elementary objects "gene" into the haplotypes of gametes, into the genotypes at diploid (or polyploid) nuclei of diplophase individuals, or into the cytotypes of mitochondria or plastids, for example. Accordingly, each level of genetic integration usually corresponds to a developmental stage or an organelle that is characterized by special combinations of genes. (To emphasize this aspect, genic integration might be the more appropriate term.) The main motivation for this paper was the realization that impacts of particular forces, selective or not, on population differentiation may not be observable at every level of genetic integration. Measurements of differentiation among populations based on gene frequencies, for example, provide no specific insights into the effects of mating systems nor of epistatic interaction on population differentiation. This is due to the fact that gene frequencies refer to the lowest level of genetic integration, namely its absence. This level, which is commonly addressed as a population's gene-pool, is conceived to consist of the set of all individual genes present in the population members for a specified set of genetic traits. Genetic studies of population differentiation are almost always based on this "beanbag" (critically reflected by Mayr [2] and defended by Haldane [3]; for concise reasoning of the persistence of the gene-pool concept see e.g. [4] or [5]). Studies of differentiation at multiple loci are no exception, since they commonly report averages over single-locus differentiation indices. Also disregarded in studies of gene-pool dif-ferentiation are gene associations that deviate from Hardy-Weinberg proportions (homologous, or intralocus, association) or gametic equilibria (non-homologous, or interlocus, association). Considering that forms and degrees of gene association may differ at different levels of genetic integration, it thus appears that previous studies on patterns of population differentiation have provided very little information on levels of genetic integration above the gene-pool.
One important reason for the usual focus on gene-pool differentiation is probably the lack of a method for measuring population differentiation consistently at all levels of genetic integration. Consistency means that comparison of the amount of differentiation among a set of populations between levels of integration provides information about the complexity of the gene associations that distinguish them. Since gene associations do not decrease as level of integration increases, neither should differentiation. Moreover, the extent of an increase in differentiation between subsequent levels should in some way reflect the degree of complexity of the additional gene associations, with equality as an indication of lack of additional complexity by some standard. Such a differentiation measure must thus be based on a conceptual characterization of the complexity of gene associations.
The existence of such a measure would not only facilitate experimental studies but also simplify the development and testing of models. Insights can be gained from models only when the characteristics described by the models derive from concepts that are conceived independently of the models. Thus models do not serve to analyze characteristics: characteristics serve to analyze models. Moreover, model-based analysis that is limited to falsification of a particular model or its parameterization provides no information on the validity of related models. A conceptually argued measure, in contrast, can be applied to whole classes of models. This permits summarization of characteristics they have in common, the statistical significance of which can be tested by permutation analysis.
In the present paper, a new approach to differentiation analysis is presented that applies a conceptually argued measure of differentiation Δ SD to analyze and compare differentiation patterns among populations at different levels of integration. Presentation includes the development of Δ SD , representation of patterns of differentiation, and tests of significance of the patterns. Comparison of differentiation between levels of integration is analyzed mathematically. The method's usefulness is demonstrated by applying it to six-locus microsatellite data from four stands of pedunculate oak (Quercus robur). The purpose of using real data is to show how insights can be gained directly from observations without limitation to particular models, the testability of which may be difficult. It turned out that the large increases in differentiation between levels that were observed in the real data were not producible in numerous simulations of simple selection models, indicating that these models cannot explain the complexity of the real data. Studies of the behavior of this measure using simulated data from increasingly complex models will be the subject of a future paper.
To prevent possible misunderstanding, it should be mentioned that this approach differs in content from any type of (hierarchical) partitioning, apportionment, or allocation of genetic variation (such as within and between populations). Methods of attributing overall variation to partitions draw upon the principle of the analysis of variance and were extended to include more general measures of difference between individuals by Rao (equation 2.3.1 in [6]). An application of this generalization to a special measure of genetic difference for multiple loci between haplotypes led Excoffier et al. [7] to the formulation of their "analysis of molecular variance". In contrast, the levels of genetic integration dealt with here cannot serve as classes (partitions) over which genetic variation is distributed. Instead, at each integration level (e.g. gene pool, single-locus genotypes, multilocus genotypes) the genetic characteristics can be analyzed for their differentiation within population subdivisions. Subsequent comparison between levels reveals which level of integration, and thus which type of gene association (especially homologous vs. non-homologous), has the greatest influence on the differentiation within the partition.

Levels of genetic integration and gene association
At the lowest level of genetic integration, the gene-pool, the gene-type of each individual gene is characterized by the gene locus at which it is located and by its allelic state. Assuming that the degree of ploidy is the same at all loci, the relative frequencies of the gene-types in the gene-pool of a population equal ·p i;l , where L is the number of loci and p i;l is the relative frequency of the i-th allele at the l-th gene locus in the population (∑ i p i;l = 1, ). If loci of differing degree of ploidy (e.g. nuclear and organelle) are included in the analysis, replace with the locus-specific quantities r l obtained by division of the degree of ploidy at the l-th locus by the sum of the degrees over all loci. The gene-pool frequency of the gene-type specified by the i-th allele at the l-th locus then equals r l ·p i;l . At higher levels of genetic integration, where the objects of interest represent compositions of several individual genes together with their gene-types, association among gene-types becomes relevant for differentiation studies. If the objects are diplophase individuals and if the gene-types are specified at a single gene-locus, then all associations among the genes that make up the genotypes are homologous (i.e., allelic) by definition. When multiple loci are considered, both homologous and non-homologous (interlocus) associations exist among genes. If the objects are haplophases, each object having just one gene per locus, then all gene associations are nonhomologous. Since at any given locus all objects carry the number of (allelic) individual genes specified by the degree of ploidy of the locus, the objects representing a given level of genetic integration are characterized by the same number of individual genes.

The elementary genic difference
From this perspective, genetic differences between two objects of the same level of integration are basically determined by the number of their individual genes that differ in type. If the numbers of copies of the i-th allele at the lth gene locus are denoted by n i;l and m i;l , respectively, then the two objects differ by ∑ i,l |n i;l -m i;l | gene-type copies. This sum is maximal, equaling two times the total number K of individual genes represented in each object, if the objects share no gene-types (and thus differ completely). Since ∑ i,l n i;l = ∑ i,l m i;l = K holds, division of ∑ i,l |n i;l -m i;l | by 2·K yields a measure of genic difference that is bounded between zero and one. This measure of elementary genic difference is applicable to all levels of integration. It differs from a closely related index suggested by Smouse and Peakall [8] in a different context, in which the absolute difference is replaced by the squared difference, a disadvantage of which is that objects sharing no gene-types need not realize the maximum difference.  1 L obtained for the two genotypes A 1 A 2 /B 1 B 2 and A 1 A 2 /B 3 B 3 , even though all genic differences are now due to the alleles at a single locus (B).
These considerations show that objects representing higher levels of genetic integration are not simply of the same or different genetic type, as is the case at the level of the gene-pool. Specification of the gene-types of which the genetic types are composed yields a measure of the differences between them that ensures the comparability of genetic differences even across levels of genetic integration. Thus, analysis of population differentiation at higher levels of integration should take into account not only differences in the frequencies of the genetic types among populations but also the variation in the pairwise differences between types.

The measure Δ of genetic distance between two populations
The measure Δ of genetic distance between two populations developed by Gregorius et al. [9] considers both the frequencies of genetic types and their pairwise differences, while avoiding the conceptual problems of dispersion indices (e.g. average differences within and between populations, see [6]). For a specified trait, Δ equals the minimum extent to which the genetic types of individuals in one of the two populations must be altered in order to obtain the composition of genetic types in the other. Denote: where d(a, b) specifies the difference between genetic types a and b, and s(a, b) is a frequency shift. Frequency shifts are performed from types that are more frequent in the one population than in the other to types that are less frequent in than in . If the frequency p a of type a in exceeds the frequency q a of this type in , then the excess p a -q a must be shifted to types deficient in , such that ∑ b s(a, b) = p a -q a = p a -min{p a , q a }. The shift process is continued for all types with a frequency excess in until the frequencies of all types in match those in . Since there may be many different ways of shifting, Δ is taken to be the minimum of the above sum over all admissible frequency shifts s, i.e., In [9] and [10] it is shown that finding a shift transformation s that minimizes Δ(s) is equivalent to solving the "Transportation Problem" [11] by linear programming methods. These methods are implemented in the computer program DeltaS [12].
In combination with the measure of elementary genic difference, the measure Δ provides the desired conceptual method for studying population differentiation at different levels of genetic integration. At the lowest integration level, the gene-pool, where gene-types are specified by indices i; l and their frequencies in populations and as r l ·p i;l and r l ·q i;l (see above), Δ assumes a familiar form.
Since individual genes are distinguished only by their identity or non-identity in type, one obtains elementary genic differences d(a, b) = 1 for a ≠ b and d(a, b) = 0 for a = b. For any frequency shift s, it holds that Δ(s) = ∑ a, b s(a, b) = ∑ a (p a -min{p a , q a }) = ∑ a |p a -q a |. Insertion of the gene-type notation in place of the a's then yields: where: In this expression, d 0 (p (l) , q (l) ) is a familiar measure of genetic distance between two populations with allele frequencies p (l) and q (l) at locus l (see e.g. [13]). It turns out that the gene-pool distance between two populations equals the average distance over the single loci.
At the diplophase level of integration, for example, consider two populations and with Hardy-Weinberg proportions (HWP) for the two alleles A 1 and A 2 at a locus.
Let p 1 > q 1 , and let have more heterozygotes than .
Then there is only one way s of shifting, namely s(A 1 A 1 , Since for the elementary genic distance, d( In this example, the distance at the diplophase level equals the gene-pool distance. Under Results it is shown (Proposition 1) that the diplophase distance is never less than the gene-pool distance and that equality at the two levels is of particular interest.

Patterns of differentiation among populations
At this point, each level of integration for a set of populations is characterized by a matrix of pairwise distances Δ between the populations. These matrices and the relationships among them can be called the pattern of differentiation among the populations. Three approaches to the description of differentiation patterns are discussed.

Clustering methods
Matrices of pairwise genetic distances between populations are commonly represented using clustering methods as dendrograms, the topologies (cluster structures) of which are of primary interest. In particular, the emergence of new cluster structures at higher levels of integration emphasizes the necessity to consider evolutionary forces of population differentiation that go beyond those conventionally held responsible for gene-pool differentiation. Detection of such structures of course depends on comparison of the dendrograms from different levels of integration, where the gene-pool constitutes the basic reference for comparison. There are many ways of comparing dendrograms obtained with the same clustering method (for an overview see e.g. [14], p. 94ff). We will concentrate instead on direct comparison of the quantities underlying all methods of clustering, i.e., the matrix of pairwise distances. Changes in topology are most likely to occur when the distance matrices show poor correspondence across levels of integration, that is, low covariation (see below).

Variance decomposition
Another common approach is less detailed and essentially rests on the computation of a single statistic of the degree of differentiation among populations. Among these measures, most of which are indexed by ST , the classical versions F ST [15] and G ST [16] consider population differentiation solely for allele frequencies. More recent versions such as Φ ST [7] or R ST [17] include variable differences between genetic types. Inferences on patterns of differentiation are more or less restricted to ways in which an observed amount of differentiation could have evolved under certain model assumptions. Moreover, the whole family of ST -measures is based on the principle of variance decomposition, where the difference between the total variation and the average variation within populations is divided by the total variation. Such measures do not assume their maximum values only for completely differentiated populations. This follows directly from their conceptual underpinning, which refers to partitioning rather than differentiation of genetic variation among populations. The ST -measures therefore have limited relevance as indicators of patterns of differentiation among populations.
Symmetric population differentiation Δ SD For this reason, preference is given here to a related but more detailed approach that refers to the concept of symmetric set difference [18,19]. In this concept, each popula-tion is characterized by its genetic distance from its complement, i.e., the totality (union) of the remaining populations. By this means, populations can be ranked according to their contributions to the overall amount of differentiation. Application of the distance measure Δ to the concept of symmetric set difference yields quantities Δ j as the distance Δ(p(j), (j)) between the j-th population (j) and its complement (j). Denoting p(j) as the vector of type frequencies characterizing the j-th population, the vector (j) of type frequencies that represent the The differentiation pattern inherent in Δ SD and its components Δ j for variable population sizes can be illustrated as a "differentiation snail" [18] (see Fig. 2 below). The snail complements the pattern characteristics obtainable from clustering methods or directly from the distance matrix in that it reveals tendencies of population assemblages to be genetically dispersed or to concentrate genetic variation in a few populations. In order to assess changes in the snail between levels of genetic integration, the following measure of covariation of the respective components Δ j can be applied.

Covariation of differentiation between integration levels
The degree of correspondence between differentiation indices from two levels of integration can be determined by a measure of covariation. Commonly chosen measures of covariation are any of the versions of the productmoment correlation which are designed to quantify the closeness to a linear type of covariation between two var- iables. However, since our genetic distances are bounded, linear relationships can be realized only under very exceptional conditions. Moreover, it is difficult to see how relationships between levels of integration could be brought about by forces acting linearly on genetic distances. From this perspective it is preferable to use a measure of covariation that relies on general monotonic relationships between two variables. Such measures would more reliably detect any consistency of patterns of differentiation over levels of integration. As was pointed out in [20], a suitable measure of covariation is: where the variables X i and Y i refer to genetic distances at two different levels of integration. In the case of the distances between a population and its complement, X i and Y i refer to Δ i at the two levels of integration. In the case of pairwise distances between populations, X i and Y i refer to the i-th element of the distance matrix for each of the two levels of integration. C varies between -1 and +1 such that C = 1 for strictly positive and C = -1 for strictly negative covariation. It is undefined in the practically irrelevant case where a non-zero difference for one variable implies equality for the other.

Permutation test of the significance of genetic differentiation patterns
Any increase of genetic differentiation among populations at higher levels of genetic integration is due to forces of association of genes that differ among populations. It is thus of basic interest to know whether the differentiation observed at a level of integration can be explained by random combination of genes (e.g. into diploid genotypes or haplotypes) or whether directed forces of combination must be assumed. This requires an analysis that is conditional on the gene-pool of each population, the number of populations, and the population sizes. The effects of chance can be assessed by permuting the genes within each population, such that all homologous and nonhomologous combinations of genes (alleles) into (haploid, diploid or polyploid) genotypes have equal probability. For each such permutation, the values of all relevant descriptors (e.g. covariation C for distance matrices and differentiation snails, the mean pairwise distance Δ in the distance matrix, the symmetric population differentiation Δ SD ) are determined. By performing a large number of permutations, the significance of each observed descriptor value can be measured in terms of the P-value, which is the proportion of permutations yielding descriptor values greater than or equal to the observed value. For interpretation of the results, both very small Pvalues (≤ 0.05) and very large P-values (≥ 0.95) are of interest.
This permutation analysis differs from common permutation analyses of differentiation among populations, in which the individuals (together with their fixed genotypes) are permuted over the populations. Such analyses aim to explain gene-pool differences among populations.
In contrast, the present paper is targeted at forces of genetic differentiation that originate from the association of genes in diplo-or haplo-states and that thus go beyond those responsible for gene-pool differentiation.

Effects of level of genetic integration on the pattern of differentiation among populations
Proceeding from lower to higher levels of integration, one expects an increase in differentiation among populations simply because of the larger varietal potential inherent in more complex structures. Since differentiation is based on distances, the distance between two populations should therefore also increase, or at least not decrease, with integration level. Consider two populations and , and denote the relative frequencies of their (multilocus) genotypes at L (≥ 1) loci of equal degree of ploidy (≥ 1) by frequency vectors P and Q and the relative frequencies of the gene-types in their gene-pools by frequency vectors p and q. Proof of the following Theorem requires the special properties of the elementary genic difference between genotypes, including the fact that it is a metric distance: Theorem: For any two populations and , the distance Δ between the (multilocus) genetic structures P and Q at any L gene loci (L ≥ 1) of equal degree of ploidy is not less than the mean distance between the single-locus structures P (l) and Q (l) , which in turn is not less than the distance between the corresponding gene pools p and q, that is, where the difference between genetic types (haplotypes, diplotypes) is measured by the elementary genic difference d.
Proof: The equality results from definition of Δ and genepool. The first inequality follows from Proposition 1 (see Appendix A), which states that the distance Δ between Llocus genotypic structures P and Q (L ≥ 1) is never less than between the gene-pools p and q. From this it follows that Δ(p (l) , q (l) ) ≤ Δ(P (l) , Q (l) ) for each locus l. The second inequality stems from Proposition 2 (see Appendix B), which states that the distance Δ between multilocus genotypic structures P and Q is never less than the average of the distances between the corresponding single-locus genotypic structures P (l) and Q (l) . ■ We investigated this Theorem by simulating numerous simple models. When we analyzed two populations with differing gene-pools at a locus but both showing HWP among the genotypes, we were surprised to see that the inequalities became equalities. Furthermore, the extension of HWP to inbreeding structures for the same inbreeding coefficient F (i.e., P ii = p i 2 + Fp i (1 -p i ) and P ij = 2p i p j (1 -F)) also yielded equality (F = 0 gives HWP). Equality also held when each of the genotypic structures was the product of two allelic structures (e.g., maternal and paternal), one of which was the same in both populations. When we simulated the frequencies of two-locus genotypes in two populations, both showing HWP at both loci, as the product of the single-locus genotype frequencies, equality again held. In contrast, differentiation between the gene-pool and the genotypes at a single locus did increase for inbreeding structures when the two inbreeding coefficients differed and for product structures when no two of the four allelic structures matched. No increase was obtainable between the average single-locus genotypic distance and the multilocus distance in the case of two loci, each with two alleles, not even when the selection regimes differed between the populations. It is therefore interesting that examples using real data, one of which is presented below, all showed large increases between levels, indicating that the real data does not follow simple models.
As The second inequality became an equality in our calculated examples whenever multilocus genotype frequencies were the product of single-locus genotype frequencies, i.e., in the absence of nonhomologous association.
These observations suggest that uniformity of homologous association and absence of non-homologous association result in equal distances at different integration levels. Intuitively, this coincides with the conception that absence or uniformity of association do not really introduce any new structure to the higher levels of integration. Since this phenomenon only shows up when the difference between genotypes is measured by the elementary genic distance, this measure is closely tied to the concept that the absence of association does not lead to higher differentiation at higher levels of genetic integration.
Nevertheless, absence of non-homologous association may not be a necessary condition for equality, since also occurred in some examples where association between loci was present. This means that the basic prerequisite for validity of Δ(p, q) = Δ(P, Q) (stated at the end of Appendix A), namely that every gene-type that is not of equal frequency in the two populations be either a source gene or a sink gene, may be fulfilled even in the presence of non-homologous association.
Carrying these results for Δ over to the differentiation measures Δ j and Δ SD , the differentiation among populations for multilocus genetic types (haplotypes, genotypes) equals the gene-pool differentiation if all populations show uniformity of homologous gene association (e.g. HWP, inbreeding for the same inbreeding coefficient) and absence of non-homologous association. Otherwise, differentiation may increase with level of integration, as expected.
All of these results are based on the special measure of elementary genic difference between genotypes (for any degree of ploidy). Thus any other measure is likely to yield different results, the interpretation of which would of course depend on a clear conceptual understanding of the difference measure. In particular, this concerns genetic associations that are not specifically genic. A discussion of these measures (see [21] for an overview of measures) would, however, be clearly beyond the scope of this paper.

Application of the approach to an assemblage of oak stands
The effects of the level of genetic integration on patterns of differentiation will be illustrated with the help of an example based on published data [20,22]. The reason for not applying it to particular models here is to show how insights can be gained directly from observations, without model constraints. In this data, the multilocus genotypes at the same six nuclear microsatellite loci were scored in all adult trees of four stands of pedunculate oak (Quercus robur) located in north-central Germany. Of the 159 trees in the stand near Rantzau, 154 trees could be scored at all Failure to score the complete multilocus genotypes of the other 76 trees in the stands is assumed to be independent of their genotypes. Table 1 lists the distance matrix of pairwise distances Δ between stands and their mean as well as the symmetric population differentiation Δ SD and its components Δ j , both based on the elementary genic difference between genetic types, for each of three levels of integration: the gene-pool distance is the average of the six single-locus allelic distances; the single-locus diplophase distance is also the average over the loci; the multilocus diplophase distance. It is seen that for each pair of stands, all pairwise distances Δ increase considerably with the level of integration. This indicates that neither the gene association within single loci (homologous association) nor the gene association among loci (non-homologous association) is of the same form in any two stands, and in particular that association is present. Both the distances and the snail components show a much larger increase between the sin- For four stands (abbr. R, B, S, and E) of pedunculate oak in north-central Germany scored at six nuclear microsatellite loci, genetic differentiation based on the elementary genic difference between genetic types was calculated at three levels of genetic integration: gene-pool, single-locus diplophase, and multilocus diplophase. The observed genetic distance Δ between each pair of stands and the observed distance Δ j (component of differentiation snail) of each stand j from its complement are listed together with their respective means. To assess the effect of integration level on patterns of differentiation, the lower part of the table shows the covariation between integration levels of the pairwise genetic distances Δ and of the snail components Δ j . To compare the observed distances with those obtainable if the genes were randomly arranged, 10 000 data sets were generated by random permutation of the genes at each locus within (not among) all stands. Square brackets enclose the ranges [min, max] of 10 000 distances by permutation, followed by the P-values (i.e., proportion of permutations yielding distances equal to or greater than the observed distance). Symbols ↑ ** and ↓ * indicate that fewer than 1% and more than 95%, respectively, of the permutations yielded distances equal to or greater than the observed distance gle-locus diplophase and the multilocus diplophase than between the gene-pool and the single-locus diplophase. Hence the non-homologous gene associations make a distinctly greater contribution to the differentiation than the homologous gene associations. It is interesting to consider the large increase between the single-locus and the multilocus level in the light of our failure to produce any increase at all when simulating simple selection models, as mentioned above. This indication that the data is not explainable by simple models requires further investigation.

P Q P Q
In order to be sure that this apparent discrepancy between stands in the form of association is not simply due to the small number of multilocus genotypes in the stands compared to the number that could be formed from the genes present in the stands, a permutation analysis was performed as described above. Ten thousand new data sets were generated by random permutation of the genes at each locus within each stand to form new single-locus genotypes, randomly combined to multilocus genotypes. Each observed distance was then compared to the 10 000 distances from permutation. Surprisingly, for both the single-locus diplophase and the multilocus diplophase, the observed mean pairwise distance and the symmetric population differentiation Δ SD were significantly high (i.e., higher than for 99% of all permutations). This indicates that both homologous and non-homologous association of genes follow very different rules among the stands.
The significant size of the mean of the pairwise distances for the single-locus diplophase and the multilocus diplophase may seem counterintuitive to the striking similarity of these distances within each of the three levels of integration. The same holds for the snail components. To explain this similarity, note that the range of values that appeared in the permutations is also quite narrow. Thus the collections of genes in the stands must place tight limits on the achievable distances and snail components.
Not only the sizes but also the covariation C of the pairwise distances Δ and the snail components Δ j at the different integration levels depend on the differences in gene association between levels. The positive covariation of distance matrices and of snail components for all pairs of integration levels shows that no form of association completely overturns the ranking prescribed by the gene-pool.
Whereas the gene arrangements that distinguish the single-locus diplophase from the gene-pool do produce rank changes among the stands (C = 0.893 for the distance matrix and C = 0.809 for the snail components), the gene arrangements that distinguish the single-locus diplophase from the multilocus diplophase have little effect on ranking (C = 1 for the distance matrix and C = 0.995 for the snail components). Not surprisingly, the gene arrangements that distinguish the gene-pool from the multilocus diplophase yield the weakest covariation (C = 0.720 for the distance matrix and C = 0.657 for the snail components).
This pattern of strong covariation is evident in the UPGMA dendrograms (Fig. 1) based on the three distance matrices, which are easier to visualize than the distance matrices themselves, and the differentiation snails (Fig. 2) constructed from the three sets of snail components. The dendrograms show weakly defined clusters that vary in topology between the gene-pool and the topologically identical clusters of the single-locus diplophase and the multilocus diplophase. The snails show rank changes that are based on only slight differences between the snail components.
It is interesting to compare the observed covariations with the ranges of covariation that occurred for the gene arrangements generated by the 10 000 random permutations. The distance matrices show weaker covariation between the single-locus diplophase and the multilocus diplophase in almost 92% of the permutations (P-value 0.084 for C = 1) but between the gene-pool and the single-UPGMA dendrograms at three levels of genetic integration in four oak stands Figure 1 UPGMA dendrograms at three levels of genetic integration in four oak stands. For six microsatellite loci scored in four stands of oak (R, B, S, E), UPGMA dendrograms were constructed from the matrices of genetic distances Δ between stands in Tab. 1. Within each dendrogram, the quantitative differences between clusters are weak. The gene-pool dendrogram differs qualitatively, i.e., topologically, from the topologically identical dendrograms of the higher levels. The significantly large increase in the mean pairwise distance, and thus in the length of the dendrograms, with level of integration implies that the stands show differentiation for their forms of homologous gene association and, even more so, non-homologous association.

Multilocus genotypes
Single−locus genotype−pool locus diplophase for only 73% (P-value 0.270 for C = 0.893). From the high improbability of the observed perfect covariation (C = 1) between the single-locus diplophase and the multilocus diplophase, it can be inferred that the non-homologous association has a special relationship to the homologous association in the singlelocus diplophase. In contrast, the intermediate P-value for the covariation between the gene-pool and the singlelocus diplophase implies that the homologous association is not predetermined by the collection of genes.
The snail components showed a weaker covariation between the single-locus diplophase and the multilocus diplophase for ca. 47% of the permutations (P-value 0.532 for C = 0.995) but between the gene-pool and the single-locus diplophase only for ca. 9% (P-value 0.912 for C = 0.809). This confirms the stronger effect of homologous association than non-homologous association on the ranking within the distance matrices. Compared to these, however, the snail components show stronger covariation than observed for a much higher proportion of the permutations, both for homologous and non-homologous association. Hence, the covariation of the snail components seems to be less sensitive to the effects of gene association than is the covariation of the pairwise distances. This must be due to the equalizing influence of combining three stands for comparison to the fourth that is the basis of the snail components.

Discussion of the application to the oak stands
The differentiation observed among the oak stands increases distinctly from the gene-pool level to the singlelocus diplophase. An even larger jump in differentiation occurs when the non-homologous association for the multiple loci is included. These are clear indications that all (except for perhaps one) of the stands show deviation from both HWP and gametic equilibrium, and that the degrees of deviation vary considerably among the stands. Such indications could not be confirmed by conventional statistical testing due to the large numbers of degrees of freedom and the implied weakness of the respective test statistics. It might come as a surprise that the application of the special permutation analysis presented above to genetic differences between populations detects association characteristics within populations. Confirmation and exploitation of this statistical potential deserves further investigation.
Consequently, if the four oak stands had been less clearly separated spatially, and if we had wanted to assign the trees to their proper subpopulations, we would have run into problems when making use of methods based on the absence of gene associations within populations. Methods for finding subdivisions of populations that are based on Hardy-Weinberg proportions and gametic equilibrium within populations (e.g. [23][24][25][26][27]) may therefore not have assigned the individuals to their original stands.
When comparing the observed differentiation to that producible by gene association in the stands, all 10 000 permutations agreed with the observation by showing much higher differentiation among the single-locus diplophases than among the gene-pools, both for the mean pairwise genetic distance and the symmetric population differentiation Δ SD . This tells us not only that the random generation of gene association never yielded Hardy-Weinberg structures for all loci in all four stands simultaneously. Neither was any other form of homologous association realized simultaneously that leaves differentiation unchanged (e.g. inbreeding with equal coefficients). Furthermore, all non-homologous associations showed a considerable additional increase in differentiation over the homologous associations, as is seen in the wide separation of the range of differentiation for the single-locus diplophase from the range for the multilocus diplophase. Remarkably, both ranges of differentiation are quite narrow. These results indicate that the increases in differentiation that are realizable by homologous and nonhomologous gene association can be tightly restricted by the genic composition of the populations. In such cases, equal differentiation at consecutive integration levels may not be achievable. Thus it appears that differentiation among populations with respect to their forms of gene association may be a normal occurrence. This insight Differentiation snails at three levels of genetic integration in four oak stands Figure 2 Differentiation snails at three levels of genetic integration in four oak stands. For six microsatellite loci scored in four stands of oak (R, B, S, E) the differentiation snails were constructed from the snail components Δ j in Tab. 1. Dotted circles mark the symmetric population differentiation Δ SD . Within each snail, the quantitative differences among the components are slight. Each snail differs qualitatively in the ranking of the stands from the other two (i.e., covariation C < 1 for each comparison). The significantly large increase in the radius Δ SD of the snails with each higher integration level confirms the differentiation among the stands for form of gene association. questions the common practice of restricting the measurement of population differentiation to the allelic level (e.g. F ST ), thereby ignoring the considerable effects of gene association on population differentiation. This analysis is the first of its kind. Therefore, we cannot venture a prediction about whether the above findings on covariation between levels of integration constitute a general trend. It is conceivable, for example, that these findings are mainly determined by the conspicuously large polymorphism characteristic of the microsatellite markers used in this study. Other genetic markers may tell different stories.

Conclusion
This new approach to the analysis of genetic differentiation among populations demonstrates that the consideration of gene associations within populations adds a new quality to studies on population differentiation that is overlooked when viewing only gene-pools.

Appendix A
Proposition 1: For any two populations and , the distance between the (multilocus) genetic structures P and Q at any L gene loci (L ≥ 1) of equal degree of ploidy is not less than the distance between the corresponding gene pools p and q, respectively, that is, where the difference between genetic types (haplotypes, diplotypes) is measured by the elementary genic difference d.
Whereas the equality in Proposition 1 follows from the text, proof of the inequality in Proposition 1 depends on a lemma that applies the following notation: For two populations and , let G x or G y denote the genetic types of the individuals at L gene loci of degree of ploidy N = 1, yielding K = LN genes per individual. For the relative frequencies P(G x ) and Q(G x ) of type G x in the two populations (by some ordering), denote the frequency structure of the L-locus types as P and Q. Call the ith allele at the lth locus A i;l . Term the frequency structure of the genetypes A i;l in the L-locus gene-pool as p and q. A shift transformation s(P, Q) decomposes the set of all genetic types on the basis of their relative frequencies into three sets: The source types G x for which P (G x ) > Q(G x ) holds, i.e., that show an excess in the first population with respect to the second; the sink types G x for which P (G x ) <Q(G x ) holds, i.e., that show a deficit in the first population; and those for which P(G x ) = Q(G x ) holds. In general terms, the excess of type G x is quantifiable as P (G x ) -min{P (G x ), types G x , s(P, Q) fulfills: where: s(G x , G y ) is the relative frequency among all individuals in population of individuals that are shifted from type G x to type G y . has the same sign for all pairs of types G x , G y . This distinguishes three special groups of genes: Genes A i;l for which the expression equals zero for all pairs of types G x , G y , implying that A i;l is equally frequent in the two populations and therefore shows no net shift; genes A i;l for which the expression is ≥ 0 but not ≡ 0 for all x, y, that is, that are never less frequent in source types G x than in the corresponding sink types G y , making them source genes; genes A i;l for which the expression is ≤ 0 but not ≡ 0 for all x, y, making them sink genes. (Note that a gene need not belong to any of the three groups, as is demonstrated by s(A i;l A j;l , A j;l A j;l ) > 0 and s(A i;l A j;l , A i;l A i;l ) > 0.)

Appendix B
Proposition 2: For any two populations and , the distance between the (multilocus) genetic structures P and Q at any L gene loci (L ≥ 1) of equal degree of ploidy N ≥ 1 is not less than the mean distance between the corresponding single-locus structures P (l) and Q (l) , respectively, that is,  where the difference between genetic types is measured by the elementary genic difference d.
The validity of Proposition 2 for L = 1 is obvious. For L ≥ 2, proof depends on four lemmata that apply the following notation: Let s(P, Q) be a shift transformation between the L-locus genotypic structures. Denote the various Llocus types as G x or G y , and write each type G x as the "prod- Proof: For the l-th locus it holds that: Their difference equals: The same difference results for any shift transformation s l at a locus l, since: ■ Even though marginal sums share this property with any shift transformation at the locus, the following lemma shows that marginal sums may not specify a shift transformation. The following lemma shows how to eliminate all ambivalent source/sink relationships from the marginal sums without changing the net amount shifted, i.e., amount sent away as a source minus the amount received as a sink.

Lemma 4:
The marginal sums of all types , at locus l can be used to construct a quasi-shift κ l (P (l) , Q (l) ) with the following three properties: Proof by construction: Consider the following algorithm: START: Set for all u, v.
Step 1: If holds for a type , set . Since , this has no effect on the sum . Repeat for an additional type fulfilling the condition. If none exist, go to Step 2.
Step 2: If and hold for a ≠b, set where . Because it follows that

Set
Repeat for an additional pair of types that fulfill the condition. If none exist, go to Step 3. Step