- Methodology article
- Open Access

# Contribution of genetic effects to genetic variance components with epistasis and linkage disequilibrium

- Tao Wang
^{1}Email author and - Zhao-Bang Zeng
^{2}

**10**:52

https://doi.org/10.1186/1471-2156-10-52

© Wang and Zeng; licensee BioMed Central Ltd. 2009

**Received: **21 November 2008

**Accepted: **4 September 2009

**Published: **4 September 2009

## Abstract

### Background

Cockerham genetic models are commonly used in quantitative trait loci (QTL) analysis with a special feature of partitioning genotypic variances into various genetic variance components, while the F_{∞} genetic models are widely used in genetic association studies. Over years, there have been some confusion about the relationship between these two type of models. A link between the additive, dominance and epistatic effects in an F_{∞} model and the additive, dominance and epistatic variance components in a Cockerham model has not been well established, especially when there are multiple QTL in presence of epistasis and linkage disequilibrium (LD).

### Results

In this paper, we further explore the differences and links between the F_{∞} and Cockerham models. First, we show that the Cockerham type models are allelic based models with a special modification to correct a confounding problem. Several important moment functions, which are useful for partition of variance components in Cockerham models, are also derived. Next, we discuss properties of the F_{∞} models in partition of genotypic variances. Its difference from that of the Cockerham models is addressed. Finally, for a two-locus biallelic QTL model with epistasis and LD between the loci, we present detailed formulas for calculation of the genetic variance components in terms of the additive, dominant and epistatic effects in an F_{∞} model. A new way of linking the Cockerham and F_{∞} model parameters through their coding variables of genotypes is also proposed, which is especially useful when reduced F_{∞} models are applied.

### Conclusion

The Cockerham type models are allele-based models with a focus on partition of genotypic variances into various genetic variance components, which are contributed by allelic effects and their interactions. By contrast, the F_{∞} regression models are genotype-based models focusing on modeling and testing of within-locus genotypic effects and locus-by-locus genotypic interactions. When there is no need to distinguish the paternal and maternal allelic effects, these two types of models are transferable. Transformation between an F_{∞} model's parameters and its corresponding Cockerham model's parameters can be established through a relationship between their coding variables of genotypes. Genetic variance components in terms of the additive, dominance and epistatic genetic effects in an F_{∞} model can then be calculated by translating formulas derived for the Cockerham models.

## Keywords

## Background

Genetic models provide a basis for analyzing genetic properties in study populations. For quantitative traits, one type of models that has long been used in experimental designed populations for analysis of quantitative trait loci (QTL) is the so-called Fisherian or Cockerham model with a focus on partition of genotypic variances into additive, dominance and epistatic genetic variance components, and their model parameters are often called average allelic effects. Another popular model which has been widely used in many genetic association studies is referred to as the F_{∞} model whose parameters are often defined as the additive, dominance and epistatic effects [1–4]. Over years, there have been some confusion about the relationship between these two types of models [5–7]. The relationship between the additive, dominance and epistatic effects and the genetic additive, dominance and epistatic variance components has not been well established, especially when multiple QTL are involved in presence of epistasis and linkage disequilibrium (LD). To shed some light on this issue, in this paper we further explore the differences and links between these two types of models.

In genetic studies, a partition of genotypic variance into additive, dominance and epistatic variance components provides us a basis to better understand the genetic inheritance properties of a quantitative trait from a parental population to their progeny population. For example, the additive variance is the chief cause of resemblance between relatives. The genetic variance components also play a crucial role in studies of heritability, covariance between relatives, and variance components analysis. Fisher [8] proposed a least square regression model for modeling QTL by partitioning genotypic variance into additive, and dominance variance components, where an additive variance describes the variation contributed by an average substitution effect of a specific allele transmitted from a parent to offspring, and where a dominance variance is a portion of the genotypic variance due to interaction of the two alleles from both parents. Cockerham [9, 10] extended the Fisher's model to multiple loci with redefined additive, dominance and epistatic effects of QTL based on statistical orthogonal contrasts. Kempthrone [11, 12] further generalized the model to multiple alleles. More recently, Mao *et al*. [13] extended the two-locus biallelic Cockerham model to allow Hardy-Weinberg and linkage disequilibria, and they introduced 35 allelic effect parameters to describe various allelic effects and their interactions. Zeng *et al*. [7] introduced a general multi-locus-two-allele (G2A) model to represent the Cockerham model in a multiple regression model setting, and compared several models for analyzing QTL effects and epistasis. Wang and Zeng [14] further extended the approach to multiple alleles and derived formulas for computing variance components in presence of epistasis and LD.

The F_{∞} model focuses on direct modeling of genotypic values and testing for genotypic association of QTL with quantitative traits. There have been discussions on classification of various penetrance modes of diseases based on the F_{∞} model and genotypic values [15]. In terms of its modeling scheme, as we will see later in this paper, the F_{∞} model is genotype-based by treating genotypes as different levels of the locus factors. Álvarez-Castro and Carlborg [16] also proposed a unified model to incorporate both genotypic and allelic effects into one framework. Meanwhile, there have been continuous efforts on modeling QTL effects and epistasis based on their biological functions [5, 6, 17]. More recently, several articles have addressed the issue of F_{∞} models on partition of genotypic variances [7, 18, 19]. Under the assumption of linkage equilibrium, Tiwari and Elston [19] considered a two-locus biallelic F_{∞} model and derived formulas for computing genetic variance components in terms of the additive, dominant and epistatic genetic effects. Yang [18] discussed the impact of zygotic association on partition of genotypic variance in F_{∞} models. Zeng *et al*. [7] compared the difference in definition of model parameters between the F_{∞} and Cockerham models. It was pointed out that these two types of models are different ways of modeling the genotypic values and the two models' parameters are transferable from one to the other through their relationship with the genotypic values under certain circumstances.

In this paper, we further explore the differences and links between the F_{∞} and Cockerham models in terms of their modeling schemes and on partition of the genotypic variance. First, we clarify that the Cockerham type models are in fact allele-based models with a special modification to correct a collinearity problem. Formulas of several moment functions for a two-locus biallelic Cockerham model are also derived, which are useful for deriving formulas in calculation of the genetic variance components. Next, we explore the difference and properties of these two types of models in partition of genotypic variances. We show that the traditional F_{∞} models are basically genotype-based models in which the additive and dominance effects could be confounded with each other in partition of genotypic variances. This fact usually does not affect association tests in the standard regression analysis. But it can make the partition of genotypic variances intricate, especially when locus-by-locus interactions are involved. We also introduce a mean-corrected F_{∞} model, which can provide a partially orthogonal partition of the genotypic variance between loci under zygotic equilibria, although its within-locus variances may still not be orthogonal due to the possible confounding between its coding variables of genotypes. We discuss pros and cons of the F_{∞} and Cockerham models in association analysis and in partition of the genotypic variances.

The second part of this paper concentrates on calculation of the genetic variance components in terms of the additive, dominance and epistatic genetic effects in an F_{∞} model. Tiwari and Elston [19] derived formulas for computing genetic variance components in terms of the additive, dominant and epistatic genetic effects for a two-locus biallelic F_{∞} model under the assumption of linkage equilibrium. More recently, Zeng *et al*. [7] proposed a way of linking the two sets of model parameters through their relationship with the genotypic values when fully parameterized models are applied. Since formulas for partition of the genotypic variance into additive, dominance and epistatic variances have been well established for Cockerham models [9, 14], we can then calculate the genetic variance components by translating the partition formulas of the variance components derived from their equivalent Cockerham models. As examples, for a one-locus F_{∞} model with Hardy-Weinberg disequilibrium and a two-locus F_{∞} model with both epistasis and LD, we present detailed formulas for computing various genetic variance components in terms of the additive, dominant and epistatic effects together with allele frequencies and LD measures. We also propose an alternative way of transforming the additive, dominance and epistatic effects in an F_{∞} model into the average allelic effects in its corresponding Cockerham model through the coding variables of genotypes used in these two models, which is especially useful when reduced F_{∞} models are applied. Moreover, Some practical issues relating to using of reduced F_{∞} or Cockerham models are addressed.

## Results

### Genetic models

where Y is the phenotypic value, G is the genotypic value, E is the environmental deviation, and *G* × *E* is the genetic by environmental interaction. Adjustment for environmental deviation and genetic by environmental interaction can usually be achieved by incorporating suitable environmental covariates into the model. Therefore, in the rest of the paper, we omit *E* and *G* × *E* from the model and focus on modeling and analysis of the genotypic values.

*g*at the QTL under consideration, the genotypic value

*G*(

*g*) =

*E*(

*G*|

*g*) is defined as the mean of individuals with genotypes

*g*in the study population. In practice, the genotypic value

*G*of an individual is unknown and needs to be estimated. Let

*P*

_{ g }be the genotypic distribution of the QTL in the study population, a regression model can be expressed as

*G*(

*g*) is fixed given a specific genotype

*g*. Since the QTL usually has a finite number of genotypes,

*G*(

*g*) itself can be treated as a discrete random variable that takes certain quantitative values with its distribution specified by

*P*

_{ g }. Therefore,

With a large enough random sample from a study population, the genotype data from the sample would follow approximately the same genotypic distribution as *P*_{
G
}. The classical analysis of variance (ANOVA) or regression analysis is a typical tool for analysis of *V*_{
G
}and test for possible association of genotypes at the QTL with the phenotypic trait. Now, a fundamental question is how to model the genotypic values G(g) given the QTL genotypes.

*A*,

*a*at the locus. For each individual, we can define the following indicator variables to describe the transmission of alleles from parents to the individual.

*g*= (

*a*,

*a'*) with

*a*,

*a'*being the paternal and maternal allele, respectively. In practice, however, this model is not very useful because we usually cannot distinguish the paternal and maternal alleles from the observed genotype data; i.e., the so-called phase problem. But suppose that the paternal and maternal alleles have the same effects, which is a reasonable assumption in most of the genetic studies, then the above model can be simplified as

In this model, based on the genotypic values, we have *α'* = *G*_{
Aa
}- *G*_{
aa
}, *δ'* = (*G*_{
AA
}+ *G*_{
aa
}) - 2*G*_{
Aa
}, and the reference point (or baseline) *μ'* = *G*_{
aa
}is the genotypic value of genotype *aa*.

*V*

_{ A }is defined as a variation contributed by allelic effects alone, and the genetic dominance variance

*V*

_{ D }is the variation contributed by interaction of the paternal and maternal alleles. Under the assumption of Hardy-Weinberg equilibrium (HWE), it is well known that the genotypic variance has an orthogonal partition

*V*

_{ G }=

*V*

_{ A }+

*V*

_{ D }in which the genetic dominance variance

*V*

_{ D }becomes the deviation of the genetic variance attributable to the locus from the additive variance [4, 20]. A first look at model (2) might lead us to believe that under HWE we would have an orthogonal partition of the genotypic variance

*V*

_{ G }=

*V*

_{ A }+

*V*

_{ D }with

*V*

_{ A }=

*V*(

*α'w'*(

*g*)) and

*V*

_{ D }=

*V*(

*δ'v'*(

*g*)). However, this is not true because the interaction term

*δ'v'*(

*g*) in model (2) is correlated with the additive term

*α'w'*(

*g*) due to a positive correlation between

*z*

_{ M }(or

*z*

_{ F }) and

*v'*=

*z*

_{ M }

*z*

_{ F }. In fact, although the two indicator variables

*z*

_{ M }and

*z*

_{ F }are assumed to be independent under HWE, we have covariances Cov(

*z*

_{ M },

*z*

_{ M }

*z*

_{ F }) = Cov(

*z*

_{ F },

*z*

_{ M }

*z*

_{ F }) =

*V*(

*z*

_{ F })

*E*(

*z*

_{ M }) =

*p*

^{2}(1 -

*p*), where

*p*=

*p*

_{ A }is the frequency of allele

*A*. Therefore, the covariance between the two coding variables

*w'*and

*v'*is Cov(

*w'*,

*v'*) = Cov(

*z*

_{ M }+

*z*

_{ F },

*z*

_{ M }

*z*

_{ F }) = 2

*p*

^{2}(1 -

*p*), which means

*w'*and

*v'*are almost always positively correlated as long as the frequency of allele

*A*not being zero. Even more general, from the definition of

*w'*and

*v'*above, we can show that Cov(

*w', v'*) = 2(1 -

*p*)

*P*

_{ AA }, regardless of whether there is HWE or not. Thus, model (2) provides a partition of the genotypic variance as

*α'*and

*δ'*. This problem, caused by using two correlated explanatory variables

*w'*,

*v'*in a multiple regression model, is often referred to as a confounding problem, or statistically, a multicollinearity problem, which tends to make and partition of variance components and the interpretation of the regression coefficients intricate, and in extreme cases leads to large standard errors for the least square estimates. To overcome this multicollinearity problem on partition of genetic variances, one strategy is to make mean corrections on those genotype coding variables [7, 14]. If we introduce two mean-corrected index variables defined by

*x*

_{ M }=

*z*

_{ M }-

*p*and

*x*

_{ F }=

*z*

_{ F }-

*p*, then we can build a modified version of model (2) as in the following

It should be pointed out that the index variable *v* as defined above is slightly different by (-2) folds from the one we defined in [14] in order to keep the definition of *δ* consistent with the G2A model introduced in Zeng *et al*. [7], of which the standard F_{2} model is a special case.

*z*

_{ M }and

*z*

_{ F }are independent, as well as the index variables

*x*

_{ M }and

*x*

_{ F }. Thus we have now , which leads to our familiar orthogonal partition of the genotypic variance

*V*

_{ G }=

*V*

_{ A }+

*V*

_{ D }with

*V*

_{ A }=

*α*

^{2}

*V*(

*w*) = 2

*α*

^{2}

*pq*and

*V*

_{ D }=

*δ*

^{2}

*V*(

*v*) = 4

*δ*

^{2}

*p*

^{2}

*q*

^{2}, where

*q*= 1 -

*p*. Under Hardy-Weinberg disequilibrium, we can represent genotype frequencies as

*P*

_{ AA }=

*p*

^{2}+

*pqf*,

*P*

_{ Aa }= 2

*pq*- 2

*pqf*and

*P*

_{ aa }=

*q*

^{2}+

*pqf*, where f is a measure of departure from HWE. Then the genotypic variance

*V*

_{ G }=

*V*

_{ A }+

*V*

_{ D }+ 2Cov(

*A*,

*D*) with

*w'*,

*v'*in model (2) and the index variables

*w*,

*v*in model (3) have relationships

*w'*=

*w*+ 2

*p*and . Note that

*w'*is still the one that specifies the additive effect except with a constant shift, whereas

*v'*includes a portion of

*w*, which is the reason why model (2) cannot provide orthogonal partition of genotypic variance under HWE. The positive correlation between the two coding variables

*w'*and

*v'*in model (2) can also complicates the interpretation of regression parameters

*α'*,

*δ'*. Using the method proposed in the next section, we can show that the parameters in models (2) and (3) have relationships

*α'*=

*α*+ 2

*pδ*and

*δ'*= -2

*δ*. Thus, the additive effect

*α'*in model (2) is actually a combination of the average allelic effect

*α*and dominance effect

*δ*in the Cockerham model (3). On model (2) in partition of genotypic variance, we have under HWE

where *V* (*w*) = 2*pq*. Note that *V*_{
A
}= *α*^{2}*V* (*w*) = 2*pqα* ^{2}. So, the positive correlation between the two coding variables *w'* and *v'* leads to an increased share of *V* (*δ'v'*) other than *V*_{
D
}, which is partly contributed by a portion of the additive variance. By using the mean-corrected index variables *w* and *v*, the Cockerham model allows us to separate the confounding effects of the two variables *w'* and *v'* at least under HWE in partition of genotypic variance *V*_{
G
}. As a result, the dominance variance *V*_{
D
}in the Cockerham model (3) is the *additional* variation contributed by interaction of the paternal and maternal alleles, in addition to the additive variance.

**A**and

**B**with alleles

*A*,

*a*and

*B*,

*b*, respectively. We can define indicator variables:

*p*

_{1}=

*P*

_{ A },

*p*

_{2}=

*P*

_{ B }, and assuming that paternal and maternal gametes (alleles and haplotypes) have the same genetic frequencies and effects, we obtain the following two-locus (G2A) Cockerham model [14]

*x'*s variables are scaled to zero in the population, it is easy to see that all the components in model (4) are independent with each other under Hardy-Weinberg and linkage equilibria, which leads to orthogonal partition of variance components. In addition, those mean-corrected variables

*x'*s defined above have some nice properties that can facilitate derivation of formulas for various variance and covariance components. For example, for two loci

**A**and

**B**under HWE but with LD between them, we can show through some derivation that for any integers

*m*,

*n*> 0

where *q*_{1} = 1 - *p*_{1}, *q*_{2} = 1 - *p*_{2} and *D* = *P*_{
AB
}- *p*_{1} *p*_{2}. These moment functions are quite useful in deriving formulas for partition of the genotypic variance into various allelic based variance components for the above G2A Cockerham model. Besides, under gametic equilibrium,
,
are independent of
,
. Hence,
for any *j*, *k* = 1, 2 and integers *m*, *n* > 0. Moreover,
and
, as we do not distinguish the paternal and maternal gametes.

*E*

_{G 2A·AB}= (

*μ*,

*α*

_{1},

*δ*

_{1},

*α*

_{2},

*δ*

_{2}, (

*αα*), (

*αδ*), (

*δα*), (

*δδ*))

^{ T }simply provide a re-parameterization of the 9 genotypic values

*G*

_{ AB }= (

*G*

_{22},

*G*

_{21},

*G*

_{20},

*G*

_{12},

*G*

_{11},

*G*

_{10},

*G*

_{02},

*G*

_{01},

*G*

_{00})

^{ T }, where

*G*

_{ ij },

*i*,

*j*= 0, 1, 2, denote genotypic values with

*i*,

*j*being the counts of

*A*,

*B*alleles in the corresponding genotypes. Using the same notation as in Zeng

*et al*. [7], we have where

As pointed out in [7], the above relationship holds regardless of whether there is a linkage equilibrium or disequilibrium in the study population.

_{∞}models. Still, let us first consider the simple case of one locus with two alleles

*A*,

*a*. In this case, we have three possible genotypes

*AA*,

*Aa*and

*aa*, and correspondingly three possible genotypic values

*G*

_{ AA },

*G*

_{ Aa }and

*G*

_{ aa }. The single locus F

_{∞}model is then given by [3, 4]

where *a*, *d* are often called the additive, dominance effects of alleles *A*, *a*, respectively. In terms of the genotypic values, the additive and dominance effects are defined as
,
,
. This model is referred to as an F_{∞} model simply because the reference point *m* in the model is the mean of two homozygote genotypic values which corresponds to the mean in an F_{∞} population [1, 2].

Since *m*, *a* and *d* in this model simply provides a re-parameterization of the original three genotypic values *G*_{
AA
}, *G*_{
Aa
}and *G*_{
aa
}, we can refer *a*, *d* as genotypic effects of the QTL with *m* as a reference baseline.

Statistically, in order to see whether the QTL genotypes is associated with a disease phenotype, we need to test for whether *G*_{
AA
}= *G*_{
Aa
}= *G*_{
aa
}or, equivalently, a null hypothesis of *H*_{0}: *a* = *d* = 0 versus its alternative *H*_{
a
}: *a* or *d* ≠ 0. The standard regression approach can usually provide unbiased estimates of the model parameters and appropriate test for *H*_{0} regardless of possible correlation between *w** (*g*), *v** (*g*), although it may give large standard errors for the least square estimates of parameters when this correlation is very strong.

Now, let us look at the performance of model (5) on partition of genotypic variances. As *w** and *v** are two coding variables for the three genotypes at the same locus, they are inherently correlated. In fact, let *P*_{
AA
}, *P*_{
Aa
}, *P*_{
aa
}be the genotype frequencies, we can show that Cov(*w**, *v**) = *P*_{
Aa
}(*P*_{
aa
}- *P*_{
AA
}) ≠ 0 as long as *P*_{
aa
}≠ *P*_{
AA
}. They also have relationships with the index coding variables *w'*, *v'* in model (2) and the index variables *w*, *v* in model (3) as *w** = *w'* - 1 = *w* + 2*p* - 1, *v** = *w'* - 2*v'* = (1 - *p*)*w* + *v* + (2*p* - *p*^{2}).

In terms of the model parameters, we can show that *a* = *α* - (1 - 2*p*)*δ* and *d* = *δ*. In summary, we have the following conclusions.

• Model (5) usually provides a different partition of the genotypic variance *V*_{
G
}than the one from the Cockerham model (3).

• When *P*_{
aa
}= *P*_{
AA
}, model (5) can give an orthogonal partition of the genotypic variance *V*_{
G
}= *V* (*aw**) + *V* (*dv**), which is different from *V*_{
G
}= *V*_{
A
}+ *V*_{
D
}in the Cockerham model (3) under the assumption of HWE unless
.

• The potential correlation between *w** and *v** often leads to an increased share of *V* (*dv**) other than *V*_{
D
}, which is partly contributed by a portion of the additive variance.

• The dominance effect *d* is the same as the allelic interaction *δ* in the Cockerham model. As a result, *V*_{
D
}= 0 if *d* = 0.

• The additive effect *a* = 0 is equivalent to *α* = (1 - 2*p*)*δ* for the allelic effects in the Cockerham model. So, a = 0 does not necessarily imply *V*_{
A
}= 0.

Note also that making mean-corrections on the two coding variables *w** and *v** of genotypes does not help to separate their confounding in this case because *dv** in model (5) is not an interaction term.

_{∞}model (5) to multiple QTL is straightforward. Still consider two loci

**A**and

**B**with alleles

*A*,

*a*and

*B*,

*b*, respectively. We can introduce variables (

*g*), (

*g*), i = 1,2, using the same '1 - 0 - (-1)' and '0 - 1 - 0' coding for QTL genotypes at each locus. Then a two-locus F

_{∞}model with epistasis included yields

*G*

_{ AB }. As shown in Zeng

*et al*. [7], this two-locus F

_{∞}model can be written in a matrix form as , where = (

*m*,

*a*

_{1},

*d*

_{1},

*a*

_{2},

*d*

_{2},

*aa*,

*ad*,

*da*,

*dd*)

^{ T }, and

where *W*_{
AB
}= *diag*(*P*_{22}, *P*_{21}, *P*_{20}, *P*_{12}, *P*_{11}, *P*_{10}, *P*_{02}, *P*_{01}, *P*_{00}) is of full rank with *P*_{
ij
}being the frequency of genotypes corresponding to *G*_{
ij
}, *i*, *j* = 0, 1, 2. So, the LSE provide unbiased estimates of
, regardless of whether there are Hardy-Weinberg or linkage disequilibria in the genotypic distribution *P*_{
g
}. However, as pointed out in Zeng *et al*. [7], the additive effect *a*_{1} can no longer be interpreted as a half of the difference between the homozygote genotypic values *G*_{2} = *E*(*G*|*AA*) and *G*_{0} = *E*(*G*|*aa*) at locus **A** in the presence of interaction effects, and so does the dominance effect *d*_{1} as the difference between the heterozygote genotypic value *G*_{1} = *E*(*G*|*aa*) and the mean of the homozygote genotypic values *G*_{2}, *G*_{0}. In addition, its partition of genotypic variance *V*_{
G
}is complex because not only the within-locus terms *a*_{
j
}
and *d*_{
j
}
are correlated for *j* = 1, 2, but the within-locus terms {*a*_{
j
}
, *d*_{
j
}
} and the locus-by-locus interactions
could also be correlated. As a result, even when the genotypes at loci **A** and **B** are independent (i.e., the so-called zygotic equilibrium between loci **A** and **B** [18]), the variance component *V* (*a*_{
j
}
+ *d*_{
j
}
), *j* = 1, 2, cannot simply be interpreted as a variation contributed by locus *j* in the presence of interactions.

*ξ*

_{ j }= -

*E*( ) and

*η*

_{ j }= -

*E*( ) to replace and for

*j*= 1,2 in the F

_{∞}model (6), this leads to the following model,

*ξ*

_{ j }and

*η*

_{ j }are very likely correlated within each locus

*j*= 1, 2. But it could help to reduce the complexity of variance partition in certain circumstances. For example, under zygotic equilibrium between loci

**A**and

**B**, {

*ξ*

_{1},

*η*

_{1}} are independent of {

*ξ*

_{2},

*η*

_{2}}, and {

*ξ*

_{ j },

*η*

_{ j },

*j*= 1, 2} are uncorrelated with interactions {

*ξ*

_{1}

*ξ*

_{2},

*ξ*

_{1}

*η*

_{2},

*η*

_{1}

*ξ*

_{2},

*η*

_{1}

*η*

_{2}} as well. As a result, the within locus effects ( ), j = 1,2, and the locus-by-locus interactions (

*aa' ξ*

_{1}

*ξ*

_{2}+

*ad' ξ*

_{1}

*η*

_{2}+

*da' η*

_{1}

*ξ*

_{2}+

*dd' η*

_{1}

*η*

_{2}) as a whole are orthogonal to each other, although the interaction terms {

*aa' ξ*

_{1}

*ξ*

_{2},

*ad' ξ*

_{1}

*η*

_{2},

*da' η*

_{1}

*ξ*

_{2},

*dd' η*

_{1}

*η*

_{2}} among themselves may still be correlated. Thus,

In this case, *V* (*a*_{
j
}
+ *d*_{
j
}
) is the variation contributed by genotypes locus *j*, while *V* (
) represents the variation contributed by genotypic interactions between loci *j* and *k*. We will refer to model (7) as a mean-corrected F_{∞} model. It is interesting to see that, in an F_{2} population, this mean-corrected F_{∞} model is reduced to the classical F_{2} model as its special case. The same situation happens for the Cockerham model (4) as well.

_{∞}model (6) except that the coding variables of genotypes are defined in different ways. From the definition of these coding variables, it is also easy to see that and . We can show that the parameters in models (8) and (6) have the following relationship

Without locus-by-locus allelic interactions, we have *a*_{
j
}=
and
for *j* = 1, 2. In the presence of locus-by-locus allelic interactions, *a*_{
j
}= *d*_{
j
}= 0 is not equivalent to
. As alleles represents the more basic levels of genetic factors than genotypes, the allele-based models are inherently more general and can be utilized to examine specific allelic effects and their interactions. When phase information is available, we could also use separate indicator variables of alleles to specify the paternal and maternal origins of alleles, which could be very useful in situations where the paternal or maternal genes may have different allelic effects and their interactions are of interest (e.g., genetic imprinting). On the other hand, the coefficients in a F_{∞} model are more closely associated with homozygosity and heterozygosity at the loci [2].

In regard to the modeling schemes, we can see that a major difference between the F_{∞} and Cockerham models lies in whether we treat genotypes or alleles as levels of the locus factors. The traditional F_{∞} models treat genotypes as levels of the locus factors with genotypic effects at each locus and locus-by-locus genotypic interactions being of major interest. The Cockerham models are defined by treating alleles as levels of the locus factors with a focus on partition of genotypic variances into various genetic variance components, and by using a mean-correction on coding variables of alleles it can effectively reduce the confounding between allelic effects and their interactions in partition of the genotypic variance. Both types of models can actually have two different versions - one is defined directly on coding of genotypes (or allele types), and the other on using mean-corrected index variables to reduce confounding between the main effects and their interactions. The former ones, either genotype-based or allele-based, have their coding variables defined on genotypes or alleles directly regardless of the genotypic or allelic distributions. The latter ones are based on some mean-corrected index variables, which depend not only on the genotypes or allele types but also on frequencies of these genotypes or alleles. To distinguish model parameters in these different models and meanwhile stay consistent with current terminology, in the rest of this paper we will simply refer to the additive, dominance and epistatic effects
in a traditional F_{∞} model as the *genotypic effects*; the parameters in a mean-corrected F_{∞} model as the *average genotypic effects* with their corresponding variance components as *genotypic variance components*; the parameters in an allele-based model (e.g., model (2) or (8)) which is defined based on the coding variables of allele types as the *allelic effects*; and parameters in the traditional (mean-corrected) Cockerham model as the *average allelic effects* with their corresponding variance components as *allelic variance components*.

Models directly using coding variables of genotypes or allele types are appealing in practice due to their simplicity. However, statistical tests of the genotypic or allelic effects based on p-values are highly dependent on the regression model, the distribution assumptions and the available sample size. A statistically significant genetic effect with a small p-value does not necessarily imply a clinically important finding. Besides, there could be inconsistency in definition of model parameters based on a one-locus model or a two-locus model with epistasis [7]. That is, when a multi-locus model is applied with epistasis involved, the interpretation of the additive and dominance effects based on one QTL model may change. On the other hand, using models with the mean-corrected index variables can allow us to assess how much variations are actually contributed by certain genetic effects or interactions, which could provide consequential information for achieving the clinical importance. A drawback in using these mean-corrected models is that they bring genotype or allele frequencies into the design matrix for regression, which will contribute another source of variation in fitting the model as the genotype or allele frequencies need to be estimated in practice. This fact could make it difficult to evaluate variance in estimates of the variance components.

The traditional (mean-corrected) Cockerham model can provide orthogonal partition of genotypic variance into additive, dominance and epistatic variance components under HWE and linkage equilibrium, while under zygotic equilibrium the mean-corrected F_{∞} model can give orthogonal partition of genotypic variances between different loci and locus-by-locus interactions. Which of the two mean-corrected models can provide simpler structure in partition of the genotypic variance really depends on the equilibrium situation in our sample. It is easy to see that a linkage equilibrium between alleles at two QTL under HWE can guarantee zygotic equilibrium of genotypes at the two loci but not the vice versa. Thus, for multiple QTL under both linkage and Hardy-Weinberg equilibria, the Cockerham model is preferred. When there is zygotic equilibrium of genotypes between two loci but no linkage equilibrium, a mean-corrected F_{∞} model might be preferred. In general, no one model is always preferable to the other in partition of genotypic variances. However, as HWE is expected to (or approximately) held in most of the human genomic regions, QTL with zygotic equilibrium but no linkage equilibrium are possible but rare. In addition, the allelic variance components are important quantities in assessing covariance between relatives and more closely related to the inheritance properties of quantitative traits. As a result, the allelic variance components based on the Cockerham model would expected to be of the main research interest in most of the cases for the genetic variance components analysis.

### Genotypic effects and allelic variance components

In Zeng *et al*. [7], it was pointed out that the additive, dominance and epistatic effects in an F_{∞} model and the average allelic effects in a Cockerham model are simply two different ways of re-parameterization for the genotypic values. They are transferable from each other through their relationship with the genotypic values when fully parameterized models are applied. Since partition of genetic variance components based on Cockerham models has been well established [14, 21, 22], a relationship between the genotypic effects in an F_{∞} model and the average allelic effects in its corresponding Cockerham model would allow us to compute various allelic variance components in terms of genotypic effects by translating those formulas on partition of genotypic variance derived from the Cockerham models based on the average allelic effects. In this section, we present detailed formulas for computing the allelic variance components in terms of the genotypic effects for the one-locus F_{∞} model (5) under Hardy-Weinberg disequilibrium and the two-locus F_{∞} model (6) with both epistasis and LD between the two loci. We also propose an alternative way of linking these two sets of parameters through the relationship between the coding variables of genotypes used in F_{∞} models and the mean-corrected index variables used in the Cockerham models. Some practical issues relating to using of reduced models instead of the fully parameterized models are also addressed.

_{∞}model (5) and its equivalent Cockerham model (3). As pointed out in [7], we can build the relationship between the two sets of model parameters through the genotypic values. Since both models give a full parameterization of the three genotypic values

*G*

_{ AA },

*G*

_{ Aa }and

*G*

_{ aa }, based on the coding functions for the three genotypes, we have

*α*is the same substitution effect of replacing allele

*a*by

*A*as presented in [4] (p.114). Replacing

*α*,

*δ*in the formula (4) by

*a*,

*d*, we obtain the following partition of

*V*

_{ G }in terms of

*a*,

*d*in model (5)

Under HWE, we have *f* = 0. Then *V*_{
A
}= 2*pq* [*a* + *d*(*q* - *p*)]^{2} and *VD* = 2(*pqd*)^{2}. This is the same results that were presented in [4, 20].

*m*,

*a*

_{1},

*d*

_{1},

*a*

_{2},

*d*

_{2},

*aa*,

*ad*,

*da*,

*dd*) and the average allelic effects

*E*

_{G 2A·AB}in its equivalent Cockerham model have the relationship , which yields

**A**and

**B**but allowing LD between the two loci, by applying the properties of moment functions we derived before, it can be shown that the variance and covariance components in terms of average allelic effects in the two-locus Cockerham model (4) are given below

where *A*_{1} = *α*_{1} *w*_{1}, *D*_{1} = *δ*_{1} *v*_{1}, *A*_{2} = *α*_{2} *w*_{2}, *D*_{2} = *δ*_{1} *v*_{2}, *A*_{1} *A*_{2} = (*αα*)*w*_{1} *w*_{2}, *A*_{1} *D*_{2} = (*αδ*)*w*_{1} *v*_{2}, *D*_{2} *A*_{1} = (*δα*)*v*_{1} *w*_{2} and *D*_{1} *D*_{2} = (*δδ*)*v*_{1} *v*_{2}. Note that the covariance components are caused by correlation between various allelic effects and interactions, while the interactions contribute their own variances regardless of whether the alleles are in HWE and LD or not. The above results are similar to what we presented in [14] for a general G2A model except that a more detailed partition of variance components and their covariance structures are shown here. Note also that the scales for defining the index variables *v*_{1}, *v*_{2} here are slightly different by (-2) folds from the ones used in [14] to keep consistent with the ones used in Zeng *et al*. [7]. Correspondingly, those coefficients related to *v'* s in model (4) differ from the ones in [14] by (-2) or 4 folds depending on how many *v'* s are involved. Replacing the allelic effects in the above formulas by genotypic effects using their relationship (9), we can then obtain formulas of the variance and covariance components in terms of the genotypic effects for partition of the genotypic variance. When there is linkage equilibrium between loci **A** and **B**, then *D* = 0 and we have exactly the same result as presented in Tiwari and Elston [19].

In genetic applications, using fully parameterized models may not always be practical due to limited sample sizes, multiple QTL, or a large number of alleles or genotypes showing up at certain QTL. Including all possible genotypic or allelic interactions could make the genetic model over parameterized and hard to fit with too many parameters involved. Collapsing certain number of alleles or genotypes may simplify the model structure but dosing so could meanwhile increase the risk of losing detection of certain informative signals, as effects of true functional alleles can be attenuated by other non-functional alleles. By contrast, a simplified genetic model could be used to include only lower-order terms such as additive, dominance and additive by additive interactions.

_{∞}model with only additive effects at the two loci and the additive by additive interaction being involved. Then, the reduced model is given by

*δ*

_{1}=

*δ*

_{2}= (

*α*

_{1}

*δ*

_{2}) = (

*δ*

_{1}

*α*

_{2}) = (

*δ*

_{1}

*δ*

_{2}) = 0. Thus, when there is HWE at loci

**A**,

**B**and linkage equilibrium between loci

**A**and

**B**, the partition of genotypic variance is given by , with

**A**,

**B**but LD between the two loci, we will still have the same , and . Besides,

*E*

_{G 2A·AB}and the genotypic effects . Alternatively, we can establish the relationship between

*E*

_{G 2A·AB}and through the coding variables used in the F

_{∞}models and the index variables used in the Cockerham models. It is easy to see that the index variables , in the F

_{∞}model (6) and

*w*

_{1},

*w*

_{2}in the Cockerham model (4) have the following relationship

*β*and the genotypic effects

*b*is given by

Therefore, with the relationships (11), we can easily transform a F_{∞} model to its equivalent Cockerham model, or vise versa.

_{∞}models specify the genotypic values correctly. In practice, the true genotypic values are unknown and a reduced model can only provide an approximation of the true genotypic values. In this case, the least square estimates from fitting a reduced model simply gives an unbiased estimator of the partial regression coefficients with expected mean

where *W*_{
AB
}= *diag*(*P*_{22}, *P*_{21}, *P*_{20}, *P*_{12}, *P*_{11}, *P*_{10}, *P*_{02}, *P*_{01}, *P*_{00}) is the same as defined before, (
)^{
g
}denotes a generalized inverse of the matrix (
). In this case, the true parameters
may depend on not only the genotypic values but also the genotypic frequencies *P*_{
g
}with possible allelic association such as LD involved - a fundamental difference between the statistical models and functional models as claimed in [17]. Furthermore, from the relationship
, we can see that in general only certain linear combinations of *E*_{G 2A·AB}can be estimated from
because
may no longer be a non-singular square matrix. Thus, in this situation, some allelic variance components may not be directly estimable in terms of the genotypic effects in a reduced F_{∞} model. Alternatively, we can start from a reduced Cockerham model and derive its corresponding reduced F_{∞} model through using the relationship (11) when some allelic variance components can be reasonably ignored.

## Discussion

Nowadays, the F_{∞} models have been widely used in genetic association studies to test for genotypic association and their interactions with quantitative traits. Most of current association studies, however, focus on reporting p-values from statistical association tests on the additive, dominance and epistatic effects of QTL. As we have pointed out, an assessment of genetic variations contributed by these genetic effects to the overall genotypic variance is another important piece of information which could be consequential for achieving the clinical significance. Unfortunately, the calculation of either the genotypic or allelic variance components for F_{∞} models is not trivial, especially when QTL interactions and LD are involved.

In this paper, we first explored the modeling schemes for the F_{∞} and Cockerham models. We showed that the F_{∞} models are basically genotype-based models by treating genotypes as different factor levels, while the Cockerham models are allele-based models with a special modification to correct a collinearity problem. These two models usually provide different partitions of genotypic variances. Due to an inherent correlation between the additive and dominance effects within a locus in F_{∞} models, variances contributed by the within-locus additive and dominance effects are quite often confounded with each other. Therefore, separate assessment of variations contributed by the additive and dominance effects within a locus is not very meaningful. In order to fully capture the genotypic contribution at a locus, variations contributed by both additive and dominance effects should be assessed jointly.

In this paper, we also pointed that either the Cockerham or the F_{∞} model can have two different versions - one is defined based on some coding variables for allele types or QTL genotypes, and the other uses some mean-corrected index variables. Using those mean-corrected index variables can help to reduce the complexity in partition of genotypic variances under either linkage or zygotic equilibria. For example, the traditional (mean-corrected) Cockerham model can provide orthogonal partition of genotypic variance into additive, dominance and epistatic variance components under HWE and linkage equilibrium, while under zygotic equilibrium a mean-corrected F_{∞} model can provide orthogonal partition of genotypic variances between different loci and locus-by-locus interactions. By introducing the mean-corrected index variables, we can easily fit a Cockerham model or a mean-corrected F_{∞} model and compute various allelic or genotypic variance and covariance components using the standard regression approach. It was also noticed that the classical F_{2} model used in experimental designed populations is actually a special case of both the traditional (mean-corrected) Cockerham model and the mean-corrected F_{∞} model.

Using the mean-correction to dissect the confounding of main effects and their interactions on partition of variances is a useful strategy that can also be applied to evaluate gene by environmental interactions. Back to the original model (1), similar to the allelic effects and their interactions in model (2), the genetic main effect *G* and the gene by environmental interactions *G* × *E* could be correlated as well. This correlation usually does not affect the association test of the gene by environmental interactions. But it can lead to a covariance between the genetic main effect *G* and the gene by environmental interactions *G* × *E* on partition of the phenotypic variances even when the main effects of *G* and *E* are uncorrelated, which complicates evaluation of the variation contributed by *G* × *E*. If we make mean-corrections on both *G* and *E*, then we can obtain an orthogonal partition of the phenotypic variance *V*_{
Y
}contributed by *G*, *E* and their interactions *G* × *E* as long as *G* and *E* are uncorrelated. Without making these mean-corrections, the variance *V* (*G* × *E*) itself could be an incorrect estimate of the actual variation contributed by *G* × *E* in addition to the genetic and environmental variances *V* (*G*) and *V* (*E*).

**A**,

**B**and

**C**, for example, we can show that

for any integers *n*_{1}, *n*_{2}, *n*_{3} > 0. For more than three loci with linkage disequilibria, the moment functions will become more complex. But it is still computationally feasible as long as we have information about the haplotype distribution in the sampled population.

When there is no need to distinguish the paternal and maternal gametes, the F_{∞} and Cockerham models are transferable. There are two different ways of linking the genotypic effect parameters in a F_{∞} model with the allelic effect parameters in its corresponding Cockerham model - through either their relationship with the genotypic values, or the relationship between the coding variables of genotypes used in the two types of models. By establishing the relationship between the genotypic effects and allelic effects, we can then calculate the allelic variance components for a F_{∞} model using the partition formulas derived for its Cockerham model. Using this approach, for a one-locus F_{∞} model under Hardy-Weinberg disequilibrium and a two-locus G2A model with epistasis and LD, we presented detailed formulas for partition of genetic variances in terms of their genotypic effects. Moreover, some practical issues related to using of reduced instead of fully parameterized F_{∞} models were also addressed.

Both the F_{∞} and the Cockerham models are statistical models, as their model parameters depend on not only genotypic values but also the genotypic distribution in the sampled population especially when reduced forms of the models are used. On the other hand, several attempts have been made to model QTL effects and epistasis based on their biological functions. Cheverud and Routman [5] and Cheverud [6] introduced an unweighted F_{∞} model and defined several specific epistases termed as "physiological epistases". Hansen and Wagner [17] further inspected genotype-based interactions and termed them as "functional epistases". In our opinion, these two kinds of models are different ways of modeling genetic effects and their interactions, and they serve for different research interests. The function-based models could be very useful in analysis of the molecular functions of genes and their pathways. In genetic mapping studies, however, they are hindered by lack of appropriate model building tools and the fact that the genotypic values are statistically defined as expected means over the genotypic distribution in the sampled population. With a great body of available regression tools, the statistical models provide a powerful tool for detecting at least relatively common genes with certain magnitude of genetic effects in accommodation with the limited sample sizes.

## Conclusion

The Cockerham type models are allele-based models whereas F_{∞} regression models are genotype-based models. When allelic effects and their interactions are of main research interests, the Cockerham type models are recommended. As genotype-based models, the F_{∞} models are most suitable for examining genotypic effects and their interactions. Since the allelic variance components are important quantities in assessing covariance between relatives, the calculation and statistical tests of the allelic variance components would be helpful for assessing how much variations are actually contributed by the allelic effects at each locus, and the locus-by-locus allelic interactions, which could become a crucial piece of information for assessing the clinical importance. For a Cockerham model with the mean-corrected index variables, the allelic variance components can be estimated directly using the standard regression approach. For an F_{∞} model, when there is no need to distinguish the paternal and maternal allelic effects, we can transform it into its corresponding Cockerham model through the relationship between their coding variables of genotypes. Allelic genetic variance components for the F_{∞} model can then be calculated by either fitting its equivalent Cockerham model or translating formulas derived from the Cockerham model in terms of the additive, dominance and epistatic genetic effects in the F_{∞} model. Both the F_{∞} and Cockerham models provide basis for the QTL analysis. We believe that a better understanding of the differences and links between these two types of models will be helpful for genetic association mapping studies, variance components analysis and dissection of the genetic architecture of quantitative traits.

## Declarations

### Acknowledgements

The authors would like to thank two anonymous reviewers for their constructive comments, and Dr. Lizzy De Lobel in Department of Applied Mathematics & Computer Science at Ghent University, Belgium, for his questions that motivate this study.

## Authors’ Affiliations

## References

- Hayman BI, Mather KM: The description of genetic interactions in continuous variation. Biometrics. 1955, 11: 69-82. 10.2307/3001481.View ArticleGoogle Scholar
- Veen Van Der JH: Tests of non-allelic interaction and linkage for quantitative characters in generations derived from two diploid pure lines. Genetics. 1959, 30: 201-232.Google Scholar
- Mather K, Jinks JL: Biometrical genetics. 1982, Landon: Chapman & HallView ArticleGoogle Scholar
- Falconer DS, Mackay TFC: Introduction to Quantitative Genetics. 1996, Harlow, UK: LongmanGoogle Scholar
- Cheverud JM, Routman EJ: Epistasis and its contribution to genetic variance components. Genetics. 1995, 139: 1455-1461.PubMed CentralPubMedGoogle Scholar
- Cheverud JM: Detecting epistasis among quantitative trait loci. Epistasis and the Evolutionary Process. Edited by: Wolf JBWM, Brodie EDI. 2000, Oxford Univ. Press, UK, 58-81.Google Scholar
- Zeng ZB, Wang T, Zou W: Modeling quantitative trait loci and interpretation of models. Genetics. 2005, 169: 1711-1725. 10.1534/genetics.104.035857.PubMed CentralView ArticlePubMedGoogle Scholar
- Fisher RA: The correlation between relatives on the supposition of Mendelian inheritance. Trans Roy Soc. 1918, 52: 399-433.View ArticleGoogle Scholar
- Cockerham CC: An extension of the concept of partitioning hereditary variance for analysis of covariances among relatives when epistasis is present. Genetics. 1954, 39: 859-882.PubMed CentralPubMedGoogle Scholar
- Cockerham CC: Effects of linkage on the covariances between relatives. Genetics. 1956, 41: 138-141.PubMed CentralPubMedGoogle Scholar
- Kempthorne O: The correlation between relatives in a random mating population. Proc Royal Soc B. 1954, 143: 103-113. 10.1098/rspb.1954.0056.View ArticleGoogle Scholar
- Kempthorne O: An Introduction to Genetic Statistics. 1957, New York: John Wiley & SonsGoogle Scholar
- Mao YC, London NR, Ma L, Dvorkin D, Da Y: Detection of SNP epistasis effects of quantitative traits using an extended Kempthorne model. Physiol Genomics. 2006, 28: 46-52. 10.1152/physiolgenomics.00096.2006.View ArticlePubMedGoogle Scholar
- Wang T, Zeng ZB: Models and partition of variance for quantitative trait loci with epistasis and linkage disequilibrium. BMC Genetics. 2006, 7: Article 9-View ArticleGoogle Scholar
- Li W, Reich J: A complete enumeration and classification of two-locus disease models. Human Heredity. 2000, 50: 334-349. 10.1159/000022939.View ArticlePubMedGoogle Scholar
- Álvarez-Castro JM, Carlborg Ö: A unified model for functional and statistical epistasis and its application in quantitative trait loci analysis. Genetics. 2007, 176: 1151-1167. 10.1534/genetics.106.067348.PubMed CentralView ArticlePubMedGoogle Scholar
- Hansen TF, Wagner GP: Modeling genetic architecture: a multilinear theory of gene interaction. Theoretical Population Biology. 2001, 59: 61-86. 10.1006/tpbi.2000.1508.View ArticlePubMedGoogle Scholar
- Yang RC: Epistasis of Quantitative Trait Loci Under Different Gene Action Models. Genetics. 2004, 167: 1493-1505. 10.1534/genetics.103.020016.PubMed CentralView ArticlePubMedGoogle Scholar
- Tiwari HK, Elston RC: Deriving components of genetic variance for multilocus models. Genetic Epidemiology. 1997, 14: 1131-1136. 10.1002/(SICI)1098-2272(1997)14:6<1131::AID-GEPI95>3.0.CO;2-H.View ArticlePubMedGoogle Scholar
- Lynch M, Walsh B: Genetics and Analysis of Quantitative Traits. 1998, Sunderland, MA: Sinauer AssociatesGoogle Scholar
- Weir BS, Cockerham CC: Two-locus theory in quantitative genetics. Proceedings of the international conference on quantitative genetics. Edited by: E Pollack OK, Bailey TB. 1977, Iowa State Univ. Press, Ames, 247-269.Google Scholar
- Weir BS, Cockerham CC: Complete characterization of linkage disequilibrium at two loci. Mathematical evolutionary theory. Edited by: Feldman MW. 1989, Princeton, NJ: Princeton University Press, 86-110.Google Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.