Genetic models
In the analysis of quantitative trait, the observed phenotypes can usually be expressed through the following model
where Y is the phenotypic value, G is the genotypic value, E is the environmental deviation, and G × E is the genetic by environmental interaction. Adjustment for environmental deviation and genetic by environmental interaction can usually be achieved by incorporating suitable environmental covariates into the model. Therefore, in the rest of the paper, we omit E and G × E from the model and focus on modeling and analysis of the genotypic values.
Quantitative trait loci (QTL) refer to genes that contribute to variation of a quantitative trait. In a study population, given specific genotypes
g at the QTL under consideration, the genotypic value
G(
g) =
E(
G
g) is defined as the mean of individuals with genotypes
g in the study population. In practice, the genotypic value
G of an individual is unknown and needs to be estimated. Let
P
_{
g
}be the genotypic distribution of the QTL in the study population, a regression model can be expressed as
where the genotypic value
G(
g) is fixed given a specific genotype
g. Since the QTL usually has a finite number of genotypes,
G(
g) itself can be treated as a discrete random variable that takes certain quantitative values with its distribution specified by
P
_{
g
}. Therefore,
With a large enough random sample from a study population, the genotype data from the sample would follow approximately the same genotypic distribution as P
_{
G
}. The classical analysis of variance (ANOVA) or regression analysis is a typical tool for analysis of V
_{
G
}and test for possible association of genotypes at the QTL with the phenotypic trait. Now, a fundamental question is how to model the genotypic values G(g) given the QTL genotypes.
In human genome, an individual always carries two alleles at a QTL  one from the father and the other from the mother. It is possible that a disease is caused by a mutant allele inherited from one of the parents. To understand such inheritance properties from parents to their offspring, a natural way is to treat paternal and maternal alleles as two different factors and assess their allelic effects. Given that, let us first consider a single QTL case with two alleles
A,
a at the locus. For each individual, we can define the following indicator variables to describe the transmission of alleles from parents to the individual.
Then we can write down a simple regression model as
where
g = (
a,
a') with
a,
a' being the paternal and maternal allele, respectively. In practice, however, this model is not very useful because we usually cannot distinguish the paternal and maternal alleles from the observed genotype data; i.e., the socalled phase problem. But suppose that the paternal and maternal alleles have the same effects, which is a reasonable assumption in most of the genetic studies, then the above model can be simplified as
where
w'(
g),
v'(
g) are defined as
In this model, based on the genotypic values, we have α' = G
_{
Aa
} G
_{
aa
}, δ' = (G
_{
AA
}+ G
_{
aa
})  2G
_{
Aa
}, and the reference point (or baseline) μ' = G
_{
aa
}is the genotypic value of genotype aa.
Typically, the genetic additive variance
V
_{
A
}is defined as a variation contributed by allelic effects alone, and the genetic dominance variance
V
_{
D
}is the variation contributed by interaction of the paternal and maternal alleles. Under the assumption of HardyWeinberg equilibrium (HWE), it is well known that the genotypic variance has an orthogonal partition
V
_{
G
}=
V
_{
A
}+
V
_{
D
}in which the genetic dominance variance
V
_{
D
}becomes the deviation of the genetic variance attributable to the locus from the additive variance [
4,
20]. A first look at model (2) might lead us to believe that under HWE we would have an orthogonal partition of the genotypic variance
V
_{
G
}=
V
_{
A
}+
V
_{
D
}with
V
_{
A
}=
V (
α'w'(
g)) and
V
_{
D
}=
V (
δ'v'(
g)). However, this is not true because the interaction term
δ'v'(
g) in model (2) is correlated with the additive term
α'w'(
g) due to a positive correlation between
z
_{
M
}(or
z
_{
F
}) and
v' =
z
_{
M
}
z
_{
F
}. In fact, although the two indicator variables
z
_{
M
}and
z
_{
F
}are assumed to be independent under HWE, we have covariances Cov(
z
_{
M
},
z
_{
M
}
z
_{
F
}) = Cov(
z
_{
F
},
z
_{
M
}
z
_{
F
}) =
V (
z
_{
F
})
E(
z
_{
M
}) =
p
^{2}(1 
p), where
p =
p
_{
A
}is the frequency of allele
A. Therefore, the covariance between the two coding variables
w' and
v' is Cov(
w',
v') = Cov(
z
_{
M
}+
z
_{
F
},
z
_{
M
}
z
_{
F
}) = 2
p
^{2}(1 
p), which means
w' and
v' are almost always positively correlated as long as the frequency of allele
A not being zero. Even more general, from the definition of
w' and
v' above, we can show that Cov(
w', v') = 2(1 
p)
P
_{
AA
}, regardless of whether there is HWE or not. Thus, model (2) provides a partition of the genotypic variance as
with a portion of it contributed by correlation between the effects
α' and
δ'. This problem, caused by using two correlated explanatory variables
w',
v' in a multiple regression model, is often referred to as a confounding problem, or statistically, a multicollinearity problem, which tends to make and partition of variance components and the interpretation of the regression coefficients intricate, and in extreme cases leads to large standard errors for the least square estimates. To overcome this multicollinearity problem on partition of genetic variances, one strategy is to make mean corrections on those genotype coding variables [
7,
14]. If we introduce two meancorrected index variables defined by
x
_{
M
}=
z
_{
M
}
p and
x
_{
F
}=
z
_{
F
}
p, then we can build a modified version of model (2) as in the following
where
w(
g),
v(
g) are defined by
It should be pointed out that the index variable v as defined above is slightly different by (2) folds from the one we defined in [14] in order to keep the definition of δ consistent with the G2A model introduced in Zeng et al. [7], of which the standard F_{2} model is a special case.
Model (3) is actually a regression form of the Cockerham model in one QTL case [
7]. Under HWE, the indicator variables
z
_{
M
}and
z
_{
F
}are independent, as well as the index variables
x
_{
M
}and
x
_{
F
}. Thus we have now
, which leads to our familiar orthogonal partition of the genotypic variance
V
_{
G
}=
V
_{
A
}+
V
_{
D
}with
V
_{
A
}=
α
^{2}
V (
w) = 2
α
^{2}
pq and
V
_{
D
}=
δ
^{2}
V (
v) = 4
δ
^{2}
p
^{2}
q
^{2}, where
q = 1 
p. Under HardyWeinberg disequilibrium, we can represent genotype frequencies as
P
_{
AA
}=
p
^{2} +
pqf,
P
_{
Aa
}= 2
pq  2
pqf and
P
_{
aa
}=
q
^{2} +
pqf, where f is a measure of departure from HWE. Then the genotypic variance
V
_{
G
}=
V
_{
A
}+
V
_{
D
}+ 2Cov(
A,
D) with
Back to the previous model (2), it is easy to see that the coding variables
w',
v' in model (2) and the index variables
w,
v in model (3) have relationships
w' =
w + 2
p and
. Note that
w' is still the one that specifies the additive effect except with a constant shift, whereas
v' includes a portion of
w, which is the reason why model (2) cannot provide orthogonal partition of genotypic variance under HWE. The positive correlation between the two coding variables
w' and
v' in model (2) can also complicates the interpretation of regression parameters
α',
δ'. Using the method proposed in the next section, we can show that the parameters in models (2) and (3) have relationships
α' =
α + 2
pδ and
δ' = 2
δ. Thus, the additive effect
α' in model (2) is actually a combination of the average allelic effect
α and dominance effect
δ in the Cockerham model (3). On model (2) in partition of genotypic variance, we have under HWE
where V (w) = 2pq. Note that V
_{
A
}= α
^{2}
V (w) = 2pqα
^{2}. So, the positive correlation between the two coding variables w' and v' leads to an increased share of V (δ'v') other than V
_{
D
}, which is partly contributed by a portion of the additive variance. By using the meancorrected index variables w and v, the Cockerham model allows us to separate the confounding effects of the two variables w' and v' at least under HWE in partition of genotypic variance V
_{
G
}. As a result, the dominance variance V
_{
D
}in the Cockerham model (3) is the additional variation contributed by interaction of the paternal and maternal alleles, in addition to the additive variance.
The Cockerham model (3) can easily be extended to multiple loci. For example, consider two loci
A and
B with alleles
A,
a and
B,
b, respectively. We can define indicator variables:
for the two loci separately. By further introducing
and
, where
p
_{1} =
P
_{
A
},
p
_{2} =
P
_{
B
}, and assuming that paternal and maternal gametes (alleles and haplotypes) have the same genetic frequencies and effects, we obtain the following twolocus (G2A) Cockerham model [
14]
Based on these meancorrected index variables, this Cockerham model allows us to easily incorporate some allelic related properties, such as HWE or linkage equilibrium information, into the variance partition analysis [
14]. For instance, since the means of the
x's variables are scaled to zero in the population, it is easy to see that all the components in model (4) are independent with each other under HardyWeinberg and linkage equilibria, which leads to orthogonal partition of variance components. In addition, those meancorrected variables
x's defined above have some nice properties that can facilitate derivation of formulas for various variance and covariance components. For example, for two loci
A and
B under HWE but with LD between them, we can show through some derivation that for any integers
m,
n > 0
where q
_{1} = 1  p
_{1}, q
_{2} = 1  p
_{2} and D = P
_{
AB
} p
_{1}
p
_{2}. These moment functions are quite useful in deriving formulas for partition of the genotypic variance into various allelic based variance components for the above G2A Cockerham model. Besides, under gametic equilibrium,
,
are independent of
,
. Hence,
for any j, k = 1, 2 and integers m, n > 0. Moreover,
and
, as we do not distinguish the paternal and maternal gametes.
Note that the above model (4) uses 9 parameters to model G(g), which contains 9 parameters. So this is also a fully parameterized model. In other words, the model parameters
E
_{G2A·AB}= (
μ,
α
_{1},
δ
_{1},
α
_{2},
δ
_{2}, (
αα), (
αδ), (
δα), (
δδ))
^{
T
}simply provide a reparameterization of the 9 genotypic values
G
_{
AB
}= (
G
_{22},
G
_{21},
G
_{20},
G
_{12},
G
_{11},
G
_{10},
G
_{02},
G
_{01},
G
_{00})
^{
T
}, where
G
_{
ij
},
i,
j = 0, 1, 2, denote genotypic values with
i,
j being the counts of
A,
B alleles in the corresponding genotypes. Using the same notation as in Zeng
et al. [
7], we have
where
As pointed out in [7], the above relationship holds regardless of whether there is a linkage equilibrium or disequilibrium in the study population.
In genetic association studies, we are often interested in examining association of genotypes at certain genetic markers or QTLs with a disease phenotype. In this case, a standard approach is to fit a regression model with genotypes at each locus being treated as different levels of the locus factor. This leads to another popular type of models that have been widely used in genetic association studies; i.e., the socalled F
_{∞} models. Still, let us first consider the simple case of one locus with two alleles
A,
a. In this case, we have three possible genotypes
AA,
Aa and
aa, and correspondingly three possible genotypic values
G
_{
AA
},
G
_{
Aa
}and
G
_{
aa
}. The single locus F
_{∞} model is then given by [
3,
4]
where a, d are often called the additive, dominance effects of alleles A, a, respectively. In terms of the genotypic values, the additive and dominance effects are defined as
,
,
. This model is referred to as an F_{∞} model simply because the reference point m in the model is the mean of two homozygote genotypic values which corresponds to the mean in an F_{∞} population [1, 2].
The above model can also be written in a regression model form as
where
w* (
g),
v* (
g) are two coding functions of genotypes
g which are defined as
Since m, a and d in this model simply provides a reparameterization of the original three genotypic values G
_{
AA
}, G
_{
Aa
}and G
_{
aa
}, we can refer a, d as genotypic effects of the QTL with m as a reference baseline.
Statistically, in order to see whether the QTL genotypes is associated with a disease phenotype, we need to test for whether G
_{
AA
}= G
_{
Aa
}= G
_{
aa
}or, equivalently, a null hypothesis of H
_{0}:a = d = 0 versus its alternative H
_{
a
}:a or d ≠ 0. The standard regression approach can usually provide unbiased estimates of the model parameters and appropriate test for H
_{0} regardless of possible correlation between w* (g), v* (g), although it may give large standard errors for the least square estimates of parameters when this correlation is very strong.
Now, let us look at the performance of model (5) on partition of genotypic variances. As w* and v* are two coding variables for the three genotypes at the same locus, they are inherently correlated. In fact, let P
_{
AA
}, P
_{
Aa
}, P
_{
aa
}be the genotype frequencies, we can show that Cov(w*, v*) = P
_{
Aa
}(P
_{
aa
} P
_{
AA
}) ≠ 0 as long as P
_{
aa
}≠ P
_{
AA
}. They also have relationships with the index coding variables w', v' in model (2) and the index variables w, v in model (3) as w* = w'  1 = w + 2p  1, v* = w'  2v' = (1  p)w + v + (2p  p
^{2}).
Therefore, we have under HWE
In terms of the model parameters, we can show that a = α  (1  2p)δ and d = δ. In summary, we have the following conclusions.

Model (5) usually provides a different partition of the genotypic variance V
_{
G
}than the one from the Cockerham model (3).

When P
_{
aa
}= P
_{
AA
}, model (5) can give an orthogonal partition of the genotypic variance V
_{
G
}= V (aw*) + V (dv*), which is different from V
_{
G
}= V
_{
A
}+ V
_{
D
}in the Cockerham model (3) under the assumption of HWE unless
.

The potential correlation between w* and v* often leads to an increased share of V (dv*) other than V
_{
D
}, which is partly contributed by a portion of the additive variance.

The dominance effect d is the same as the allelic interaction δ in the Cockerham model. As a result, V
_{
D
}= 0 if d = 0.

The additive effect a = 0 is equivalent to α = (1  2p)δ for the allelic effects in the Cockerham model. So, a = 0 does not necessarily imply V
_{
A
}= 0.
Note also that making meancorrections on the two coding variables w* and v* of genotypes does not help to separate their confounding in this case because dv* in model (5) is not an interaction term.
Extension of the F
_{∞} model (5) to multiple QTL is straightforward. Still consider two loci
A and
B with alleles
A,
a and
B,
b, respectively. We can introduce variables
(
g),
(
g), i = 1,2, using the same '1  0  (1)' and '0  1  0' coding for QTL genotypes at each locus. Then a twolocus F
_{∞} model with epistasis included yields
Model (6) is also a fully parameterized model for the 9 genotypic values
G
_{
AB
}. As shown in Zeng
et al. [
7], this twolocus F
_{∞} model can be written in a matrix form as
, where
= (
m,
a
_{1},
d
_{1},
a
_{2},
d
_{2},
aa,
ad,
da,
dd)
^{
T
}, and
When we fit the above model under a regression model framework, the expected mean of the least square estimates (LSE)
of
will be given by
where W
_{
AB
}= diag(P
_{22}, P
_{21}, P
_{20}, P
_{12}, P
_{11}, P
_{10}, P
_{02}, P
_{01}, P
_{00}) is of full rank with P
_{
ij
}being the frequency of genotypes corresponding to G
_{
ij
}, i, j = 0, 1, 2. So, the LSE provide unbiased estimates of
, regardless of whether there are HardyWeinberg or linkage disequilibria in the genotypic distribution P
_{
g
}. However, as pointed out in Zeng et al. [7], the additive effect a
_{1} can no longer be interpreted as a half of the difference between the homozygote genotypic values G
_{2} = E(GAA) and G
_{0} = E(Gaa) at locus A in the presence of interaction effects, and so does the dominance effect d
_{1} as the difference between the heterozygote genotypic value G
_{1} = E(Gaa) and the mean of the homozygote genotypic values G
_{2}, G
_{0}. In addition, its partition of genotypic variance V
_{
G
}is complex because not only the withinlocus terms a
_{
j
}
and d
_{
j
}
are correlated for j = 1, 2, but the withinlocus terms {a
_{
j
}
, d
_{
j
}
} and the locusbylocus interactions
could also be correlated. As a result, even when the genotypes at loci A and B are independent (i.e., the socalled zygotic equilibrium between loci A and B [18]), the variance component V (a
_{
j
}
+ d
_{
j
}
), j = 1, 2, cannot simply be interpreted as a variation contributed by locus j in the presence of interactions.
If we consider using the meancorrected variables
ξ
_{
j
}=

E (
) and
η
_{
j
}=

E (
) to replace
and
for
j = 1,2 in the F
_{∞} model (6), this leads to the following model,
As in the one locus case, the meancorrected variables
ξ
_{
j
}and
η
_{
j
}are very likely correlated within each locus
j = 1, 2. But it could help to reduce the complexity of variance partition in certain circumstances. For example, under zygotic equilibrium between loci
A and
B, {
ξ
_{1},
η
_{1}} are independent of {
ξ
_{2},
η
_{2}}, and {
ξ
_{
j
},
η
_{
j
},
j = 1, 2} are uncorrelated with interactions {
ξ
_{1}
ξ
_{2},
ξ
_{1}
η
_{2},
η
_{1}
ξ
_{2},
η
_{1}
η
_{2}} as well. As a result, the within locus effects (
), j = 1,2, and the locusbylocus interactions (
aa' ξ
_{1}
ξ
_{2} +
ad' ξ
_{1}
η
_{2} +
da' η
_{1}
ξ
_{2} +
dd' η
_{1}
η
_{2}) as a whole are orthogonal to each other, although the interaction terms {
aa' ξ
_{1}
ξ
_{2},
ad' ξ
_{1}
η
_{2},
da' η
_{1}
ξ
_{2},
dd' η
_{1}
η
_{2}} among themselves may still be correlated. Thus,
In general, for more than two loci under zygotic equilibria, we will have
In this case, V (a
_{
j
}
+ d
_{
j
}
) is the variation contributed by genotypes locus j, while V (
) represents the variation contributed by genotypic interactions between loci j and k. We will refer to model (7) as a meancorrected F_{∞} model. It is interesting to see that, in an F_{2} population, this meancorrected F_{∞} model is reduced to the classical F_{2} model as its special case. The same situation happens for the Cockerham model (4) as well.
We can also model multiple QTL by extending model (2) to multiple loci. For example, an allelebased twolocus biallelic model is given by
where
,
are coding variables defined in the same way as the ones in model (2) for the two loci separately. It is a model similar to the F
_{∞} model (6) except that the coding variables of genotypes are defined in different ways. From the definition of these coding variables, it is also easy to see that
and
. We can show that the parameters in models (8) and (6) have the following relationship
Without locusbylocus allelic interactions, we have a
_{
j
}=
and
for j = 1, 2. In the presence of locusbylocus allelic interactions, a
_{
j
}= d
_{
j
}= 0 is not equivalent to
. As alleles represents the more basic levels of genetic factors than genotypes, the allelebased models are inherently more general and can be utilized to examine specific allelic effects and their interactions. When phase information is available, we could also use separate indicator variables of alleles to specify the paternal and maternal origins of alleles, which could be very useful in situations where the paternal or maternal genes may have different allelic effects and their interactions are of interest (e.g., genetic imprinting). On the other hand, the coefficients in a F_{∞} model are more closely associated with homozygosity and heterozygosity at the loci [2].
In regard to the modeling schemes, we can see that a major difference between the F_{∞} and Cockerham models lies in whether we treat genotypes or alleles as levels of the locus factors. The traditional F_{∞} models treat genotypes as levels of the locus factors with genotypic effects at each locus and locusbylocus genotypic interactions being of major interest. The Cockerham models are defined by treating alleles as levels of the locus factors with a focus on partition of genotypic variances into various genetic variance components, and by using a meancorrection on coding variables of alleles it can effectively reduce the confounding between allelic effects and their interactions in partition of the genotypic variance. Both types of models can actually have two different versions  one is defined directly on coding of genotypes (or allele types), and the other on using meancorrected index variables to reduce confounding between the main effects and their interactions. The former ones, either genotypebased or allelebased, have their coding variables defined on genotypes or alleles directly regardless of the genotypic or allelic distributions. The latter ones are based on some meancorrected index variables, which depend not only on the genotypes or allele types but also on frequencies of these genotypes or alleles. To distinguish model parameters in these different models and meanwhile stay consistent with current terminology, in the rest of this paper we will simply refer to the additive, dominance and epistatic effects
in a traditional F_{∞} model as the genotypic effects; the parameters in a meancorrected F_{∞} model as the average genotypic effects with their corresponding variance components as genotypic variance components; the parameters in an allelebased model (e.g., model (2) or (8)) which is defined based on the coding variables of allele types as the allelic effects; and parameters in the traditional (meancorrected) Cockerham model as the average allelic effects with their corresponding variance components as allelic variance components.
Models directly using coding variables of genotypes or allele types are appealing in practice due to their simplicity. However, statistical tests of the genotypic or allelic effects based on pvalues are highly dependent on the regression model, the distribution assumptions and the available sample size. A statistically significant genetic effect with a small pvalue does not necessarily imply a clinically important finding. Besides, there could be inconsistency in definition of model parameters based on a onelocus model or a twolocus model with epistasis [7]. That is, when a multilocus model is applied with epistasis involved, the interpretation of the additive and dominance effects based on one QTL model may change. On the other hand, using models with the meancorrected index variables can allow us to assess how much variations are actually contributed by certain genetic effects or interactions, which could provide consequential information for achieving the clinical importance. A drawback in using these meancorrected models is that they bring genotype or allele frequencies into the design matrix for regression, which will contribute another source of variation in fitting the model as the genotype or allele frequencies need to be estimated in practice. This fact could make it difficult to evaluate variance in estimates of the variance components.
The traditional (meancorrected) Cockerham model can provide orthogonal partition of genotypic variance into additive, dominance and epistatic variance components under HWE and linkage equilibrium, while under zygotic equilibrium the meancorrected F_{∞} model can give orthogonal partition of genotypic variances between different loci and locusbylocus interactions. Which of the two meancorrected models can provide simpler structure in partition of the genotypic variance really depends on the equilibrium situation in our sample. It is easy to see that a linkage equilibrium between alleles at two QTL under HWE can guarantee zygotic equilibrium of genotypes at the two loci but not the vice versa. Thus, for multiple QTL under both linkage and HardyWeinberg equilibria, the Cockerham model is preferred. When there is zygotic equilibrium of genotypes between two loci but no linkage equilibrium, a meancorrected F_{∞} model might be preferred. In general, no one model is always preferable to the other in partition of genotypic variances. However, as HWE is expected to (or approximately) held in most of the human genomic regions, QTL with zygotic equilibrium but no linkage equilibrium are possible but rare. In addition, the allelic variance components are important quantities in assessing covariance between relatives and more closely related to the inheritance properties of quantitative traits. As a result, the allelic variance components based on the Cockerham model would expected to be of the main research interest in most of the cases for the genetic variance components analysis.
Genotypic effects and allelic variance components
In Zeng et al. [7], it was pointed out that the additive, dominance and epistatic effects in an F_{∞} model and the average allelic effects in a Cockerham model are simply two different ways of reparameterization for the genotypic values. They are transferable from each other through their relationship with the genotypic values when fully parameterized models are applied. Since partition of genetic variance components based on Cockerham models has been well established [14, 21, 22], a relationship between the genotypic effects in an F_{∞} model and the average allelic effects in its corresponding Cockerham model would allow us to compute various allelic variance components in terms of genotypic effects by translating those formulas on partition of genotypic variance derived from the Cockerham models based on the average allelic effects. In this section, we present detailed formulas for computing the allelic variance components in terms of the genotypic effects for the onelocus F_{∞} model (5) under HardyWeinberg disequilibrium and the twolocus F_{∞} model (6) with both epistasis and LD between the two loci. We also propose an alternative way of linking these two sets of parameters through the relationship between the coding variables of genotypes used in F_{∞} models and the meancorrected index variables used in the Cockerham models. Some practical issues relating to using of reduced models instead of the fully parameterized models are also addressed.
Let us start from the simple case of the single locus F
_{∞} model (5) and its equivalent Cockerham model (3). As pointed out in [
7], we can build the relationship between the two sets of model parameters through the genotypic values. Since both models give a full parameterization of the three genotypic values
G
_{
AA
},
G
_{
Aa
}and
G
_{
aa
}, based on the coding functions for the three genotypes, we have
With some simply algebra, we can show that the genotypic effects and the average allelic effects have the following relationship
where
α is the same substitution effect of replacing allele
a by
A as presented in [
4] (p.114). Replacing
α,
δ in the formula (4) by
a,
d, we obtain the following partition of
V
_{
G
}in terms of
a,
d in model (5)
Under HWE, we have f = 0. Then V
_{
A
}= 2pq [a + d(q  p)]^{2} and VD = 2(pqd)^{2}. This is the same results that were presented in [4, 20].
Similarly, for a twoQTL model (6), its genotypic effects
= (
m,
a
_{1},
d
_{1},
a
_{2},
d
_{2},
aa,
ad,
da,
dd) and the average allelic effects
E
_{G2A·AB}in its equivalent Cockerham model have the relationship
, which yields
Assuming HWE at loci
A and
B but allowing LD between the two loci, by applying the properties of moment functions we derived before, it can be shown that the variance and covariance components in terms of average allelic effects in the twolocus Cockerham model (4) are given below
where A
_{1} = α
_{1}
w
_{1}, D
_{1} = δ
_{1}
v
_{1}, A
_{2} = α
_{2}
w
_{2}, D
_{2} = δ
_{1}
v
_{2}, A
_{1}
A
_{2} = (αα)w
_{1}
w
_{2}, A
_{1}
D
_{2} = (αδ)w
_{1}
v
_{2}, D
_{2}
A
_{1} = (δα)v
_{1}
w
_{2} and D
_{1}
D
_{2} = (δδ)v
_{1}
v
_{2}. Note that the covariance components are caused by correlation between various allelic effects and interactions, while the interactions contribute their own variances regardless of whether the alleles are in HWE and LD or not. The above results are similar to what we presented in [14] for a general G2A model except that a more detailed partition of variance components and their covariance structures are shown here. Note also that the scales for defining the index variables v
_{1}, v
_{2} here are slightly different by (2) folds from the ones used in [14] to keep consistent with the ones used in Zeng et al. [7]. Correspondingly, those coefficients related to v's in model (4) differ from the ones in [14] by (2) or 4 folds depending on how many v's are involved. Replacing the allelic effects in the above formulas by genotypic effects using their relationship (9), we can then obtain formulas of the variance and covariance components in terms of the genotypic effects for partition of the genotypic variance. When there is linkage equilibrium between loci A and B, then D = 0 and we have exactly the same result as presented in Tiwari and Elston [19].
In genetic applications, using fully parameterized models may not always be practical due to limited sample sizes, multiple QTL, or a large number of alleles or genotypes showing up at certain QTL. Including all possible genotypic or allelic interactions could make the genetic model over parameterized and hard to fit with too many parameters involved. Collapsing certain number of alleles or genotypes may simplify the model structure but dosing so could meanwhile increase the risk of losing detection of certain informative signals, as effects of true functional alleles can be attenuated by other nonfunctional alleles. By contrast, a simplified genetic model could be used to include only lowerorder terms such as additive, dominance and additive by additive interactions.
Consider a simplified model from the previous twolocus F
_{∞} model with only additive effects at the two loci and the additive by additive interaction being involved. Then, the reduced model is given by
A reduced model can be thought of as adding constraints on the genotypic values. From
, we now have
and
δ
_{1} =
δ
_{2} = (
α
_{1}
δ
_{2}) = (
δ
_{1}
α
_{2}) = (
δ
_{1}
δ
_{2}) = 0. Thus, when there is HWE at loci
A,
B and linkage equilibrium between loci
A and
B, the partition of genotypic variance is given by
, with
and
.
If there is HWE at loci
A,
B but LD between the two loci, we will still have the same
,
and
. Besides,
So far, we have relied on the equation
to establish the relationship between the average allelic effects
E
_{G2A·AB}and the genotypic effects
. Alternatively, we can establish the relationship between
E
_{G2A·AB}and
through the coding variables used in the F
_{∞} models and the index variables used in the Cockerham models. It is easy to see that the index variables
,
in the F
_{∞} model (6) and
w
_{1},
w
_{2} in the Cockerham model (4) have the following relationship
for
i = 1, 2. So, replacing
w*,
v* in model (10) by
w,
v gives
which leads to the the same results as we showed before. If there are dominance effects involved in the reduced model, then
It is easy to show that the relationship between the allelic effects
β and the genotypic effects
b is given by
Therefore, with the relationships (11), we can easily transform a F_{∞} model to its equivalent Cockerham model, or vise versa.
It must be pointed out that the above relationship between the genotypic effects and the average allelic effects hold only when the reduced F
_{∞} models specify the genotypic values correctly. In practice, the true genotypic values are unknown and a reduced model can only provide an approximation of the true genotypic values. In this case, the least square estimates
from fitting a reduced model simply gives an unbiased estimator of the partial regression coefficients with expected mean
where W
_{
AB
}= diag(P
_{22}, P
_{21}, P
_{20}, P
_{12}, P
_{11}, P
_{10}, P
_{02}, P
_{01}, P
_{00}) is the same as defined before, (
)^{
g
}denotes a generalized inverse of the matrix (
). In this case, the true parameters
may depend on not only the genotypic values but also the genotypic frequencies P
_{
g
}with possible allelic association such as LD involved  a fundamental difference between the statistical models and functional models as claimed in [17]. Furthermore, from the relationship
, we can see that in general only certain linear combinations of E
_{G2A·AB}can be estimated from
because
may no longer be a nonsingular square matrix. Thus, in this situation, some allelic variance components may not be directly estimable in terms of the genotypic effects in a reduced F_{∞} model. Alternatively, we can start from a reduced Cockerham model and derive its corresponding reduced F_{∞} model through using the relationship (11) when some allelic variance components can be reasonably ignored.