- Methodology article
- Open Access

# On coding genotypes for genetic markers with multiple alleles in genetic association study of quantitative traits

- Tao Wang
^{1}Email author

**12**:82

https://doi.org/10.1186/1471-2156-12-82

© Wang; licensee BioMed Central Ltd. 2011

**Received:**31 May 2011**Accepted:**21 September 2011**Published:**21 September 2011

## Abstract

### Background

In genetic association study of quantitative traits using F_{∞} models, how to code the marker genotypes and interpret the model parameters appropriately is important for constructing hypothesis tests and making statistical inferences. Currently, the coding of marker genotypes in building F_{∞} models has mainly focused on the biallelic case. A thorough work on the coding of marker genotypes and interpretation of model parameters for F_{∞} models is needed especially for genetic markers with multiple alleles.

### Results

In this study, we will formulate F_{∞} genetic models under various regression model frameworks and introduce three genotype coding schemes for genetic markers with multiple alleles. Starting from an allele-based modeling strategy, we first describe a regression framework to model the expected genotypic values at given markers. Then, as extension from the biallelic case, we introduce three coding schemes for constructing fully parameterized one-locus F_{∞} models and discuss the relationships between the model parameters and the expected genotypic values. Next, under a simplified modeling framework for the expected genotypic values, we consider several reduced one-locus F_{∞} models from the three coding schemes on the estimability and interpretation of their model parameters. Finally, we explore some extensions of the one-locus F_{∞} models to two loci. Several fully parameterized as well as reduced two-locus F_{∞} models are addressed.

### Conclusions

The genotype coding schemes provide different ways to construct F_{∞} models for association testing of multi-allele genetic markers with quantitative traits. Which coding scheme should be applied depends on how convenient it can provide the statistical inferences on the parameters of our research interests. Based on these F_{∞} models, the standard regression model fitting tools can be used to estimate and test for various genetic effects through statistical contrasts with the adjustment for environmental factors.

## Background

Genetic markers with multiple alleles are common phenomena in genetic studies. It is well known that the ABO blood types in human are determined by three alleles at a genetic locus on chromosome 9. Molecular markers such as microsatellites often have multiple alleles. The major histocompatibility complex (MHC), a highly polymorphic genome region that resides on the human chromosome 6, encompasses multiple genes that encode for many human leukocyte antigens (HLA) and play an important role in regulation of the immune responses. Depending on the resolution level of allele typing, each of the HLA-A, B, C, DR, DQ and DP gene loci could contain tens to hundreds of allele types. In addition, in the haplotype analysis of single-nucleotide polymorphisms (SNPs), various haplotypes from a set of SNPs can also be treated as different alleles from a 'super' marker locus that consists of the set of SNPs.

Presently, there are mainly three types of genetic models that are commonly used in the genetic analysis of quantitative traits. One is Fisher's analysis of variance (ANOVA) models that focus on a decomposition of the genotypic variance into genetic variance components contributed by various genetic effects at quantitative trait loci (QTL) [1–6]. Another is the F_{∞} models that concentrate on direct statistical modeling of the expected genotypic values at target genetic markers or QTL and the association testing of various genetic effects. The other one is the so-called functional genetic models that emphasize on modeling the functional effects of genes [7]. Both Fisher's and F_{∞} models can be referred to as statistical models, while the functional genetic models have fundamentally different objectives and estimation methods from the statistical models. A considerable amount of discussion has been made about the distinction between these different types of genetic models [8–11].

The F_{∞} models have been widely used in genetic association studies of quantitative traits. In building F_{∞} models, how to code genotypes at a marker (or QTL) and interpret the model parameters are fundamental issues for constructing appropriate testing hypotheses and making correct statistical inferences. While the Fisher's ANOVA models can be directly applicable to genetic markers with multiple alleles, the F_{∞} models by contrast have been mainly discussed in the biallelic case [1, 9, 12]. For haplotype analysis, Zaykin *et al*. in [13] proposed a simple coding which included only the additive effects of haplotypes but ignored their interactions. More recently, Yang *et al*. in [11] explored an extension of the biallelic F_{∞} models to multi-allele models with a focus on the definition of various genetic effects and their relationships with the average genetic effects defined in the Fisher's models. A thorough work on coding of marker genotypes and interpretation of model parameters for F_{∞} models has not been done in the past especially for genetic markers with multiple alleles.

In general, there are two different strategies in coding the marker or QTL genotypes. One is to treat each marker or QTL as a potential risk factor with its genotypes as the risk units. Then, similar to the strategy in handling categorical covariates in classical regression models, at each locus we can create one dummy variable per genotype and then include all but one (as the reference) of these dummy variables into a model. But this genotype coding is often limited by the available sample sizes especially when the number of alleles at the marker locus is large. Alternatively, as alleles are often supposed to be the basic genetic risk units that may contribute to disease phenotypes in genetic studies, we may want to treat alleles at each marker or QTL as the risk units and examine the effects of alleles. However, genetic data has some specialty that needs to be taken into account in order to build the allele-based models. In the genome of diploid species such as human being, alleles normally appear in pairs to form a genotype at each marker locus or QTL with one from the father and one from the mother, except for the sex chromosomes in males. That is, at each locus we have two within-locus risk factors that reside on a homologous pair of chromosomes. Unlike the classical two-way ANOVA model in which the two risk factors own different risk units, the paternal and maternal risk factors at a locus often share the same set of alleles. Besides, the parental origins (i.e., the phase) of the two alleles at each locus are quite often unknown. These features could sometimes complicate the allele-based coding of marker genotypes and generate confusion in interpretation of the model parameters.

In this study, we introduce three allele-based coding schemes for building F_{∞} models, namely allele, F_{∞} and allele-count codings. First, we formulate F_{∞} models under a general regression framework to model the expected genotypic values at given markers or QTL. Then, under a standard ANOVA model setting, we present several fully parameterized one-locus models using the three allele-based coding schemes. Some potential collinearity relationships among the coding variables of the marker genotypes are clarified. Strategies to avoid the redundant model parameters are also proposed. After that, we examine the definition of model parameters under a reduced one-locus model framework. The impact of a linear relationship among the coding variables of marker genotypes on the estimability of the model parameters is fully explored based on the linear model theory. Finally, we consider extension of the one-locus models to two-locus situation. Several fully parameterized as well as reduced two-locus models are addressed. A focus of this study is to establish the relationships between the model parameters and the expected genotypic values at given marker loci or QTL for various F_{∞} models from these three coding schemes under various different model frameworks, and explain how to estimate and test for various genetic effects through statistical contrasts. Relationships among different coding schemes and models are also illustrated through simulation.

## Results

### Fully parameterized one-locus models

*Y*is typically considered as a combination of a genetic component

*G*and an environmental component

*E*with perhaps the genetic by environmental interactions

*G*×

*E*, where

*G*is the true genotypic value from a joint (unobservable) contribution of all the genetic factors to the quantitative trait

*Y*. In practice, given a random sample of

*N*individuals from a study population, let

*g*

_{ i }be the observed genotypes at certain target marker loci or QTL and

*z*

_{ i }be a vector of some environmental covariates that may contribute to the variation of the quantitative trait for individuals

*i*= 1, ...,

*N*. By ignoring the genetic by environmental interactions and assuming that the genotypic value

*G*and environmental component

*E*do not depend on the environmental covariates

*z*

_{ i }and

*g*

_{ i }, respectively, then the observed quantitative trait

*y*

_{ i }of an individual

*i*can be expressed through a regression model as

where *G*(*g*_{
i
} ) = E(*G*|*g*_{
i
} ) is the expected genotypic value of *G* given the marker (or QTL) genotypes *g*_{
i
} , *β* denotes the effects of the environmental covariates, and *e*_{
i
} is the residual error of the model with E(*e*_{
i
} ) = 0. Similar to introducing dummy variables for the covariates *z*_{
i
} which allow us to assess various environmental effects *β* in the model, it is convenient to further represent *G*(*g*_{
i
} ) as *G*(*g*_{
i
} ) = *x*(*g*_{
i
} )*α* so that we can fit the regression model and assess the genetic effects *α* of the markers or QTL, where *x*(*g*_{
i
} ) is a coding function of the marker genotypes. When the marker locus is not associated with the phenotype, then *G*(*g*_{
i
} ) = E(*G*) is a constant which does not depend on *g*_{
i
} . In the rest of the paper, we will focus on the interpretation of the marker effects *α* in terms of the expected genotypic values *G*(*g*) = E(*G*|*g*) according to different coding schemes. When certain genetic by environmental interactions are included in the model, the interpretation of *α* could be modified accordingly. It has to be pointed out that QTL are generally assumed to be unknown genomic regions that may contribute to the variation of the quantitative traits with their genotypes unobserved. But the results (i.e., the coding schemes and the relationships between the model parameters and the expected genotypic values) are held for QTL as well, although the expected genotypic values at a target QTL can no longer be directly estimated via fitting the regression models.

*A*

_{1}, ...,

*A*

_{ m },

*m*≥ 2. In general, there are

*m*possible homozygous genotypes

*A*

_{ j }

*A*

_{ j },

*j*= 1 ...,

*m*, and

*m*(

*m*- 1)/2 possible heterozygous genotypes

*A*

_{ j }

*A*

_{ k },

*j*≠

*k*. Let

*G*

_{ jk }= E(

*G*|

*g*=

*A*

_{ j }

*A*

_{ k }) be the expected genotypic values, given the marker genotypes

*A*

_{ j }

*A*

_{ k }in a study population. Without knowing the parental origins of the alleles, we assume as usual that the parental origin of the alleles does not make a difference (i.e., no imprinting). We have then

*G*

_{ jk }=

*G*

_{ kj }for

*j*,

*k*= 1, ...,

*m*, and there are totally

*m*(

*m*+ 1)/2 possible distinctive expected genotypic values

*G*

_{ jk },

*j*,

*k*= 1, ...,

*m*, which could be estimated through the means in the genotypic subgroups after adjustment for the environmental covariates. Here we assume no missing genotypes for the sampled individuals, and the random sample has its individuals carrying all possible genotypes. How to handle missing genotypes will be discussed in the discussion. To fully re-parameterize these expected genotypic values through a linear model, we then need totally

*m*(

*m*+ 1)/2 parameters including the intercept in the model. By treating the paternal and maternal alleles as two independent risk factors and following the classical two-way ANOVA notation, we can represent the genotypic values

*G*

_{ jk }as

where ${\alpha}_{j}^{*}$ and ${\delta}_{jk}^{*}$ are the realized (but unobservable) additive effects of allele *A*_{
j
} and the allelic interaction between the two alleles *A*_{
j
} and *A*_{
k
} , respectively. The above model is different from the classical two-way ANOVA model in that here both the paternal and the maternal risk factors share the same set of alleles *A*_{1}, ..., *A*_{
m
} . As usual, with the unknown paternal origins of alleles at the locus, we assume the paternal and maternal alleles have the same genetic effect. More precisely, the paternal allele *A*_{
j
} and maternal allele *A*_{
j
} have the same additive allelic effects ${\alpha}_{j}^{*}$ for *j* = 1, ..., *m*. Besides, the allelic interaction between a paternal allele *A*_{
j
} and a maternal allele *A*_{
k
} is the same as that between the paternal allele *A*_{
k
} and the maternal allele *A*_{
j
} ; i.e., ${\delta}_{jk}^{*}={\delta}_{kj}^{*}$, for *j*, *k* = 1, ..., *m*. Still, with *m* additive allelic effects and *m*(*m* + 1)/2 allelic interactions plus the intercept, it is clear that model (2) is over-parameterized on modeling the *m*(*m* + 1)/2 expected genotypic values *G*_{
jk
} for *j*, *k* = 1, ..., *m*. As a result, the parameters *μ**, ${\alpha}_{j}^{*}$ and ${\delta}_{jk}^{*}$ in model (2) are not all estimable in terms of the expected genotypic values *G*_{
jk
} (see [14, 15]).

*A*

_{ j },

*j*= 1, ...,

*m*. Then we define the following coding variables of the marker genotypes

*j*,

*k*= 1, ...,

*m*, where ${A}_{j}^{c}$ denotes any other allele type except

*A*

_{ j }. Note that

*z*

_{1j},

*z*

_{2j}are not observable because we do not know exactly which allele is inherited from paternal or maternal gamete for the sampled individuals without their parental information. But this unknown phase problem does not affect the definitions of

*w*

_{ j },

*v*

_{ jk }since

*w*

_{ j }only counts the number of allele

*A*

_{ j }in the genotypes and the value of

*v*

_{ jk }is 1 when the genotype is

*A*

_{ j }

*A*

_{ k }and 0 otherwise regardless of where the two alleles come from. We refer to the above coding of marker genotypes as an allele coding scheme. Model (2) can then be re-written in a linear model form as

*i*= 1, ...,

*N*. As each individual always carries two alleles at a marker locus with one from the father and the other from the mother, we have ${\sum}_{j=1}^{m}{z}_{1j}\left({g}_{i}\right)={\sum}_{k=1}^{m}{z}_{2k}\left({g}_{i}\right)=1$, for any

*i*= 1, ...,

*N*. Therefore, given a particular

*j*, ${w}_{jk}=2-{\sum}_{k\ne j}{w}_{k}$, which is a linear combination of the rest of {

*w*

_{ k },

*k*≠

*j*}. For

*v*

_{ jk }, we also have ${\sum}_{j=1}^{m}{v}_{jk}={z}_{2k}$, or ${v}_{jk}={w}_{k}/2-{\displaystyle {\sum}_{l\ne j}{v}_{lk}}$. Hence, each of the

*v*

_{ jk },

*k*= 1, ...,

*m*, is also a linear combination of the coding variables {

*w*

_{ k },

*k*≠

*j*} and {

*v*

_{ lk },

*l*,

*k*≠

*j*}. To avoid the redundancy of parameters due to these collinearity relationships among the coding variables in model (3), without losing generality, we consider dropping

*w*

_{ m }and {

*v*

_{ km },

*k*= 1, ...,

*m*} in (3). Then

*i*= 1, ...,

*N*. Model (4) now provides a full re-parameterization of the

*m*(

*m*+ 1)/2 expected genotypic values

*G*

_{ jk }for

*j*,

*k*= 1, ...,

*m*with its parameters

*α*

_{ j }can be referred to as the additive allelic effects and

*δ*

_{ jk }the allelic interactions with respect to the reference allele

*A*

_{ m }. Given a random sample, we can then incorporate model (4) into (1) and fit the regression model (1) using the standard least-square approach. In terms of the expected genotypic values, it is easy to show that

*μ*=

*G*

_{ mm },

*α*

_{ j }=

*G*

_{ jm }-

*G*

_{ mm }and

*δ*

_{ jk }= (

*G*

_{ jk }-

*G*

_{ km }) - (

*G*

_{ jm }-

*G*

_{ mm }), for

*j*= 1, ...,

*m*- 1 and

*k*=

*j*, ...,

*m*- 1. Therefore, the additive allelic effect

*α*

_{ j }can be interpreted as the substitution effect of replacing allele

*A*

_{ m }by

*A*

_{ j }when paired with another allele

*A*

_{ m }to form the genotypes. Meanwhile, the allelic interaction

*δ*

_{ jk }is the difference between the substitution effect of replacing allele

*A*

_{ m }by

*A*

_{ j }(or

*A*

_{ k }) when paired with allele

*A*

_{ k }(or

*A*

_{ j }) and that when paired with allele

*A*

_{ m }. Or, in other words,

*δ*

_{ jk }is the difference between the substitution effects of replacing allele

*A*

_{ m }by

*A*

_{ j }(or

*A*

_{ k }) with paired alleles

*A*

_{ k }(or

*A*

_{ j }) and

*A*

_{ m }. Note that dropping

*w*

_{ j }and {

*v*

_{ kj },

*k*= 1, ...,

*m*} for a particular

*j*≠

*m*instead of

*w*

_{ m }and {

*v*

_{ km },

*k*= 1, ...,

*m*} can lead to similar interpretations of the model parameters with

*A*

_{ j }being the reference allele. Using model (4), we can also estimate and test for various other genetic effects. For example, the so-called functional 'additive effects' ${a}_{jk}^{*}=\left({G}_{jj}-{G}_{kk}\right)\u22152$ and the 'dominance effects' ${d}_{jk}^{*}={G}_{jk}-\left({G}_{jj}+{G}_{kk}\right)\u22152$,

*j*≠

*k*defined in [11] can be expressed as ${a}_{jk}^{*}=\left({\alpha}_{j}-{\alpha}_{k}\right)+\left({\delta}_{jj}-{\delta}_{kk}\right)\u22152$ and ${d}_{jk}^{*}={\delta}_{jk}-\left({\delta}_{jj}+{\delta}_{kk}\right)\u22152-2\mu $,

*j*≠

*k*, respectively, in terms of the above model parameters. So we can estimate ${a}_{jk}^{*}$, ${d}_{jk}^{*}$ using the fitted model parameters or test for the hypothesis of ${H}_{0}:{a}_{jk}^{*}=0$ or ${H}_{0}:{d}_{jk}^{*}=0$ through the general linear contrasts [15] using the standard software such as SAS. To test whether a particular allele

*A*

_{ j }has an overall effect, the null hypothesis is

*H*

_{0}:

*α*

_{ j }=

*δ*

_{ jk }= 0 for

*k*= 1, ⋯,

*m*- 1, which can be performed through either a general linear contrast (or likelihood ratio test) with the degrees of freedom being

*m*for the test statistic. The association test for overall effects of the locus corresponds to the null hypothesis of

*H*

_{0}:

*α*

_{ j }=

*δ*

_{ jk }= 0 for any

*j*,

*k*= 1, ⋯,

*m*- 1, which has its degrees of freedom being

*m*(

*m*+ 1)/2 - 1 for the test statistic. Currently, the so-called F

_{∞}model has been widely used in genetic association studies. In the simple biallelic case with two alleles

*A*and

*α*, an F

_{∞}model gives [16–19].

*G*

_{ AA }= E(

*G*|

*AA*),

*G*

_{ Aa }= E(

*G*|

*Aa*) and

*G*

_{ aa }= E(

*G*|

*aa*) are the three possible expected genotypic values at the marker. The parameters

*a*,

*d*are often referred to as the additive and dominance effects of the allele

*A*over

*a*, and in terms of the expected genotypic values we have

*a*= (

*G*

_{ AA }-

*G*

_{ aa })/2 and

*d*=

*G*

_{ Aa }- (

*G*

_{ AA }+

*G*

_{ aa })/2. This F

_{∞}model can also be written in a linear model form as [10]

*f*,

*h*are two coding variables of the marker genotypes that are defined as

_{∞}coding. As a straightforward extension of the F

_{∞}coding scheme to multiple alleles, we can define the following coding variables

*j*= 1, ...,

*m*. It is easy to see that

*f*

_{ j },

*h*

_{ j }and the previous

*w*

_{ j },

*v*

_{ jk },

*j*,

*k*= 1, ...,

*m*have the relationships:

*f*

_{ j }(

*g*) =

*w*

_{ j }(

*g*) - 1,

*h*

_{ j }(

*g*) =

*w*

_{ j }(

*g*) - 2

*v*

_{ jj }(

*g*), and

*v*

_{ jk }(

*g*) =

*h*

_{ j }(

*g*)

*h*

_{ k }(

*g*) as

*j*≠

*k*. Thus, for the same reason to avoid collinearity, we can exclude some redundant coding variables and write a fully parameterized one-locus model using the F

_{∞}coding as

*i*= 1, ...,

*N*. By having model (5) equivalent to (4), we can first build the relationships between the two model parameters and then establish the relationships between the parameters of model (5) and the expected genotypic values as following

Therefore, *a*_{
j
} can be interpreted as a half of the difference between the two expected homozygous genotypic values *G*_{
jj
} and *G*_{
mm
} , which is the same as the additive effect ${a}_{jm}^{*}$ defined in [11]. Besides, *d*_{
jj
} is the difference between the expected heterozygous genotypic value *G*_{
jm
} and the averaged expected homozygous genotypic value (*G*_{
jj
} + *G*_{
mm
} )/2, which is the same as the dominance effect ${d}_{jm}^{*}$ defined in [11]. It is interesting to see that *d*_{
jk
} , *j* ≠ *k*, has the same interpretation as *δ*_{
jk
} in model (4), which is the difference between the substitution effects of replacing allele *A*_{
m
} by *A*_{
j
} when paired with alleles *A*_{
k
} and *A*_{
m
} . Note that *d*_{
jj
} can also be interpreted as the allelic interaction - the difference between the substitution effects of replacing allele *A*_{
j
} by *A*_{
m
} when paired with another *A*_{
j
} and *A*_{
m
} . In addition, based on model (5), the additive effects ${a}_{jk}^{*}$ and the dominance effects ${d}_{jk}^{*}$ proposed in [11] have the relationship with the model parameters: ${a}_{jk}^{*}={a}_{j}-{a}_{k}$, ${d}_{jk}^{*}={d}_{jk}+\left({d}_{jj}+{d}_{kk}\right)$, *j* ≠ *k*. The overall effect of a particular allele *A*_{
j
} can be tested through the composite hypothesis of *H*_{0} : *a*_{
j
} = *d*_{
jk
} = 0 for *k* = 1, ⋯, *m* - 1, and the overall effects of the locus can be tested via the null hypothesis of *H*_{0} : *a*_{
j
} = *d*_{
jk
} = 0 for any *j*, *k* = 1, ⋯, *m* - 1.

_{∞}codings, another way of coding the marker genotypes which occasionally appears in practice is to count the number of alleles in marker genotypes for each specific allele

*A*

_{ j }. As each individual can have 0, 1 or 2 copies of an allele

*A*

_{ j }, by taking the genotypic group with 0 copy of allele

*A*

_{ j }as the baseline, we can introduce the following two indicator (or dummy) variables for the genotypic groups with 1 and 2 copies of the allele

*A*

_{ j }, respectively.

*j*= 1, ...,

*m*- 1. These coding variables of marker genotypes have relationships

*h*

_{1j}(

*g*) =

*h*

_{ j }(

*g*) =

*w*

_{ j }(

*g*) - 2

*v*

_{ jj }(

*g*) and

*h*

_{2j}(

*g*) =

*v*

_{ jj }(

*g*) with previous ones. We refer to this coding of marker genotypes as the allele-count coding. Similar to models (4) and (5), by excluding some redundant coding variables, the allele-count coding leads to another fully parameterized one-locus model as

*i*= 1, ...,

*N*. Similarly, by having model (6) equivalent to (4), we can establish the following relationships

Therefore, *π*_{
j
} in model (6) can still be interpreted as the substitution effect of replacing allele *A*_{
m
} by *A*_{
j
} when paired with allele *A*_{
m
} , or the difference between the genotypic values of the genotype group *A*_{
j
}*A*_{
m
} with one copy of *A*_{
j
} versus the genotype group *A*_{
m
}*A*_{
m
} (baseline). *η*_{
jj
} is the difference between the expected genotypic value *G*_{
jj
} in the homozygous genotypic group *A*_{
j
}*A*_{
j
} with two copies of *A*_{
j
} and *G*_{
mm
} in the baseline group *A*_{
m
}*A*_{
m
} . Besides, *η*_{
jk
} in model (6) has the same interpretation as *δ*_{
jk
} (or *d*_{
jk
} ) before. From model (6), the general additive effects ${a}_{jk}^{*}=\left({\eta}_{jj}-{\eta}_{kk}\right)\u22152$ and the dominance effects ${d}_{jk}^{*}={\eta}_{jk}-\left({\eta}_{jj}+{\eta}_{kk}\right)\u22152-2{\pi}_{0}$, *j* ≠ *k*, which can be tested either separately or jointly. The overall effect of a particular allele *A*_{
j
} can be tested through the composite hypothesis of *H*_{0} : *π*_{
j
} = *η*_{
jk
} = 0 for *k* = 1, ⋯, *m* - 1. The overall effects of the locus can also be tested via the null hypothesis of *H*_{0} : *π*_{
j
} = *η*_{
jk
} = 0 for any *j*, *k* = 1, ⋯, *m* - 1.

*m*(

*m*+ 1)/2 expected genotypic values under the same model framework (3). The relationships between their model parameters and the expected genotypic values are summarized in Table 1. It is interesting to see from Table 1 that the null hypothesis of

*α*

_{ j }=

*δ*

_{ jj }= 0 is equivalent to either

*a*

_{ j }=

*d*

_{ jj }= 0 or

*π*

_{ j }=

*η*

_{ jj }= 0, which implies

*G*

_{ jj }=

*G*

_{ jm }=

*G*

_{ mm }. So the three models above should provide the same test statistics for testing

*α*

_{ j }=

*δ*

_{ jj }= 0,

*a*

_{ j }=

*d*

_{ jj }= 0 or

*π*

_{ j }=

*η*

_{ jj }= 0.

Parameterization of fully parameterized one-locus models (4), (5), (6).

Codings | Relationships |
---|---|

Allele | $\begin{array}{c}\mu ={G}_{mm},{\alpha}_{j}={G}_{jm}-{G}_{mm}\\ {\delta}_{jj}={G}_{jj}+{G}_{mm}-2{G}_{jm},j=1,\dots ,m-1\\ {\delta}_{jk}=\left({G}_{jk}-{G}_{jm}\right)-\left({G}_{km}-{G}_{mm}\right),j,k=1,\dots ,m-1;j<k\end{array}$ |

F | $\begin{array}{c}\tau ={G}_{mm}+\frac{1}{2}{\sum}_{j=1}^{m-1}\left({G}_{jj}-{G}_{mm}\right)\\ {a}_{j}=\frac{{G}_{jj}-{G}_{mm}}{2},{d}_{jj}={G}_{jm}-\frac{{G}_{jj}+{G}_{mm}}{2},j=1,\dots ,m-1\\ {d}_{jk}=\left({G}_{jk}-{G}_{jm}\right)-\left({G}_{km}-{G}_{mm}\right),j,k=1,\dots ,m-1;j<k\end{array}$ |

Allele-count | $\begin{array}{c}{\pi}_{0}={G}_{mm},{\pi}_{j}={G}_{jm}-{G}_{mm}\\ {\eta}_{jj}={G}_{jj}-{G}_{mm},j=1,\dots ,m-1\\ {\eta}_{jk}=\left({G}_{jk}-{G}_{jm}\right)-\left({G}_{km}-{G}_{mm}\right),j,k=1,\dots ,m-1;j<k\end{array}$ |

*A*(or

*A*

_{1}) and

*a*(or

*A*

_{2}), we have

*m*= 2 with three possible genotypic values

*G*

_{ AA }=

*E*(

*G*|

*AA*),

*G*

_{ Aa }=

*E*(

*G*|

*Aa*) and

*G*

_{ aa }=

*E*(

*G*|

*aa*). If we adopt the allele coding, then

*w*

_{2}(

*g*) = 2 -

*w*

_{1}(

*g*),

*v*

_{12}(

*g*) =

*w*

_{1}(

*g*) -

*v*

_{11}(

*g*), and

*v*

_{22}(

*g*) = 1 -

*w*

_{1}(

*g*) +

*v*

_{11}(

*g*). For the F

_{∞}coding, we have

*f*

_{2}(

*g*) = -

*f*

_{1}(

*g*) and

*h*

_{2}(

*g*) =

*h*

_{1}(

*g*). So we can further drop

*d*

_{2}in model (5). For the allele-count coding, we have

*h*

_{12}(

*g*) =

*h*

_{11}(

*g*) and

*h*

_{22}(

*g*) = 1 -

*h*

_{11}(

*g*) -

*h*

_{21}(

*g*). The interpretation of model parameters for these three biallelic QTL models are summarized in Table 2, which is a special case of Table 1.

Parameterization of one-locus models (4), (5), (6) when *m* = 2.

Codings | Models | Relationships |
---|---|---|

Allele | $\begin{array}{c}{G}_{AA}=\mu +2{\alpha}_{1}+{\delta}_{11}\\ {G}_{Aa}=\mu +{\alpha}_{1}\\ {G}_{aa}=\mu \end{array}$ | $\begin{array}{c}\mu ={G}_{aa}\\ {\alpha}_{1}={G}_{Aa}-{G}_{aa}\\ {\delta}_{11}={G}_{AA}+{G}_{aa}-2{G}_{Aa}\end{array}$ |

F | $\begin{array}{c}{G}_{AA}=\tau +{a}_{1}\\ {G}_{Aa}=\tau +{d}_{11}\\ {G}_{aa}=\tau -{a}_{1}\end{array}$ | $\begin{array}{c}\tau =\frac{{G}_{AA}+{G}_{aa}}{2}\\ {a}_{1}=\frac{{G}_{AA}-{G}_{aa}}{2}\\ {d}_{11}={G}_{Aa}-\frac{{G}_{AA}+{G}_{aa}}{2}\end{array}$ |

Allele-count | $\begin{array}{c}{G}_{AA}={\pi}_{0}+{\eta}_{11}\\ {G}_{Aa}={\pi}_{0}+{\pi}_{1}\\ {G}_{aa}={\pi}_{0}\end{array}$ | $\begin{array}{c}{\pi}_{0}={G}_{aa}\\ {\pi}_{1}={G}_{Aa}-{G}_{aa}\\ {\eta}_{11}={G}_{AA}-{G}_{aa}\end{array}$ |

*A*

_{1},

*A*

_{2}(i.e.,

*m*= 3), we have six possibly distinctive expected genotypic values

*G*

_{11},

*G*

_{22},

*G*

_{33},

*G*

_{12},

*G*

_{13}and

*G*

_{23}. Each of the three fully parameterized models (4), (5) and (6) can provide a full re-parameterization of the six expected genotypic values. In a matrix form, from the allele coding model (4), we have

_{∞}coding model (5), we have

Parameterization of one-locus models (4), (5), (6) when *m* = 3.

Codings | Relationships |
---|---|

Allele | $\begin{array}{c}\mu ={G}_{33}\\ {\alpha}_{1}={G}_{13}-{G}_{33},{\alpha}_{2}={G}_{23}-{G}_{33}\\ {\delta}_{11}={G}_{11}+{G}_{33}-2{G}_{13}\\ {\delta}_{22}={G}_{22}+{G}_{33}-2{G}_{23}\\ {\delta}_{12}={G}_{12}+{G}_{33}-{G}_{13}-{G}_{23}\end{array}$ |

F | $\begin{array}{c}\tau =\frac{{G}_{11}+{G}_{22}}{2}\\ {a}_{1}=\frac{{G}_{11}-{G}_{33}}{2},{a}_{2}=\frac{{G}_{22}-{G}_{33}}{2}\\ {d}_{11}={G}_{13}-\frac{{G}_{11}+{G}_{33}}{2}\\ {d}_{22}={G}_{23}-\frac{{G}_{22}+{G}_{33}}{2}\\ {d}_{12}={G}_{12}+{G}_{33}-{G}_{13}-{G}_{23}\end{array}$ |

Allele-count | $\begin{array}{c}{\pi}_{0}={G}_{33}\\ {\pi}_{1}={G}_{13}-{G}_{33},{\pi}_{2}={G}_{23}-{G}_{33}\\ {\eta}_{11}={G}_{11}-{G}_{33}\\ {\eta}_{22}={G}_{22}-{G}_{33}\\ {\eta}_{12}={G}_{12}+{G}_{33}-{G}_{13}-{G}_{23}\end{array}$ |

### Reduced one-locus models

*A*

_{ j }and itself for the homozygous genotypes

*A*

_{ j }

*A*

_{ j },

*j*= 1, ...,

*m*, but ignore other interactions between different alleles

*A*

_{ j }and

*A*

_{ k }(

*j*≠

*k*). Then we obtain a reduced case of model (2) as below

*j*,

*k*= 1, ...,

*m*. Similarly, using the allele coding, we can present this model in a linear model form as

for *i* = 1, ..., *N*, where *v*_{
j
} (*g*) = *v*_{
jj
} (*g*) for *j* = 1, ..., *m*, with *v*_{
jj
} (*g*) defined as before.

*α**'s due to the fact that ${\sum}_{j=1}^{m}{w}_{j}\left({g}_{i}\right)=2$ for

*i*= 1, ...,

*N*. In this case, as shown in Appendix A, the parameters ${\delta}_{1}^{*},\phantom{\rule{2.77695pt}{0ex}}.\phantom{\rule{2.77695pt}{0ex}}.\phantom{\rule{2.77695pt}{0ex}}.\phantom{\rule{2.77695pt}{0ex}},{\delta}_{m}^{*}$ in model (8) are estimable but the parameters

*μ** and ${\alpha}_{1}^{*},\phantom{\rule{2.77695pt}{0ex}}.\phantom{\rule{2.77695pt}{0ex}}.\phantom{\rule{2.77695pt}{0ex}}.\phantom{\rule{2.77695pt}{0ex}},{\alpha}_{m}^{*}$ are not estimable. To overcome the redundant parameter problem, we can drop

*w*

_{ m }from model (8) and consider

for *i* = 1, ..., *N*. Note that ${v}_{m}={z}_{1m}{z}_{2m}=1-{\sum}_{j=1}^{m-1}{w}_{j}+{\sum}_{j=1}^{m-1}{\sum}_{k=1}^{m-1}{v}_{jk}$, which cannot be completely determined by {*w*_{
j
} , *v*_{
j
} , *j* = 1, ..., *m* - 1}. Therefore, dropping {*δ*_{
jk
} , *j*, *k* = 1, ..., *m* - 1, *j* < *k*} from model (4) does not directly lead to an equivalent model of (9) as the latter contains *v*_{
m
} . In fact, as further dropping *v*_{
m
} in (9), it will lead to a more restricted model structure for the expected genotypic values with the similar interpretation of its model parameters as presented in model (4). It is also interesting to see that the haplotype coding proposed in [13] is a special case of model (9) when we further ignore all the allelic interactions and drop all the {*v*_{
j
} , *j* = 1, ..., *m*} in the model.

*μ*=

*μ**, ${\alpha}_{j}={\alpha}_{j}^{*}$ for

*j*= 1, ...,

*m*- 1, and ${\delta}_{j}={\delta}_{j}^{*}$ for

*j*= 1, ...,

*m*. Note that adding the restriction ${\alpha}_{m}^{*}=0$ on (8) does not change the modeling structure of the expected genotypic values because ${\alpha}_{m}^{*}$ is a redundant parameter given the others. Therefore,

Comparing with the parameters in model (4), we can see that the interpretation of the parameters in model (9) have changed slightly. The intercept *μ* now becomes $\left({G}_{mm}-{\delta}_{m}^{*}\right)$ instead of *G*_{
mm
} , the *α*_{
j
} is the substitution effect of replacing allele *A*_{
m
} by *A*_{
j
} when paired with any allele *A*_{
k
} (*k* ≠ *j*, *m*) instead of just *A*_{
m
} , while the *δ*_{
j
} is the difference between the substitution effect of replacing any allele *A*_{
k
} by *A*_{
j
} when paired with *A*_{
j
} itself and that when paired with another allele *A*_{
l
} (*l* ≠ *j*, *k*). If both *α*_{
j
} and *δ*_{
j
} are zero for a particular *j* < *m*, then *G*_{
jj
} = *G*_{
jm
} = *μ* and *G*_{
jk
} = *G*_{
km
} for any *k* ≠ *j*, *m*.

_{∞}coding leads to the following model

*i*= 1, ...,

*N*. By applying the relationship

*f*

_{ j }(

*g*) =

*w*

_{ j }(

*g*) - 1 and

*h*

_{ j }(

*g*) =

*w*

_{ j }(

*g*) - 2

*v*

_{ j }(

*g*) for

*j*= 1, ...,

*m*, we can show that for models (10) and (8) to be equivalent their model parameters have the relationship

*j*= 1, ...,

*m*- 1. Thus,

Now *d*_{
j
} becomes a half of the difference between the substitution effect of replacing any allele *A*_{
k
} by *A*_{
j
} when paired with another *A*_{
j
} and that when paired with an allele *A*_{
l
} (*l* ≠ *j*, *k*), which can no longer be referred to as a dominance effect.

*i*= 1, ...,

*N*. Similarly, we can show that model (11) can be treated as a reduced model by adding the restriction ${\alpha}_{m}^{*}=0$ on parameters in model (8) with the following relationships

While the effect *η*_{
jj
} in model (6) is the difference between the two expected homozygous genotypic values *G*_{
jj
} and *G*_{
mm
} , the effect *η*_{
j
} in model (11) becomes the sum of the substitution effects of replacing allele *A*_{
m
} by *A*_{
j
} when paired with *A*_{
j
} itself and when paired with another allele *A*_{
k
} (*k* ≠ *j*, *m*. It is also interesting to see that the definition of parameters in models (11) and (12) are quite different. A null hypothesis of ${H}_{0}:{\pi}_{j}^{\prime}={\eta}_{j}^{\prime}=0$ for a particular *j* < *m* in model (12) implies that *G*_{
jj
} = *G*_{
mm
} and *G*_{
jm
} - *G*_{
mm
} = *G*_{
jk
} - *G*_{
km
} for any *k* ≠ *j*, *m*, while the null hypothesis of *H*_{0} : *π*_{
j
} = *η*_{
j
} = 0 for a *j* < *m* in model (11) implies that *G*_{
jj
} = *G*_{
jm
} and *G*_{
jk
} = *G*_{
km
} for any *k* ≠ *j*, *m*, which has nothing to do with *G*_{
mm
} .

*m*non-redundant parameters (including the intercept) to model the

*m*(

*m*+ 1)/2 expected genotypic values. When

*m*> 3, we have

*m*(

*m*+ 1)/2 > 2

*m*. Therefore, the model framework (7) enforces certain constraints on the

*m*(

*m*+ 1)/2 genotypic values. If

*m*= 3, then each of the four models actually provides a full re-parameterization of the six expected genotypic values

*G*

_{11},

*G*

_{22},

*G*

_{33},

*G*

_{12},

*G*

_{13}and

*G*

_{23}. The relationships between the four model parameters and the expected genotypic values are summarized in Table 4.

Parameterization of one-locus models (9), (10), (11), (12) when *m* ≥ 3.

Codings | Restrictions | Relationships |
---|---|---|

Allele | ${\alpha}_{m}^{*}=0$ | $\begin{array}{c}\mu ={\mu}^{*}=\left({G}_{jm}+{G}_{km}\right)-{G}_{jk},j\ne k\ne m\\ {\alpha}_{j}={\alpha}_{j}^{*}={G}_{jk}-{G}_{km},j=1,\dots ,m-1;j\ne k,m\\ {\delta}_{j}={\delta}_{j}^{*}=\left({G}_{jj}-{G}_{jk}\right)-\left({G}_{jl}-{G}_{kl}\right),j=1,\dots ,m;k\ne j\ne l\end{array}$ |

F | $2{\alpha}_{m}^{*}+{\delta}_{m}^{*}=0$ | $\begin{array}{c}\tau ={\mu}^{*}+\frac{1}{2}\sum _{j=1}^{m-1}\left(2{\alpha}_{j}^{*}+{\delta}_{j}^{*}\right)={G}_{mm}+\frac{1}{2}\sum _{j=1}^{m-1}\left({G}_{jj}-{G}_{mm}\right)\\ {a}_{j}=\frac{1}{2}\left(2{\alpha}_{j}^{*}+{\delta}_{j}^{*}\right)=\frac{{G}_{jj}-{G}_{mm}}{2},j=1,\dots ,m-1\\ {d}_{j}=-\frac{{\delta}_{j}^{*}}{2}=-\frac{\left({G}_{jj}-{G}_{jk}\right)-\left({G}_{jl}-{G}_{kl}\right)}{2},j=1,\dots ,m;j\ne k\ne l\end{array}$ |

Allele-count | ${\alpha}_{m}^{*}=0$ | $\begin{array}{c}{\pi}_{0}={\mu}^{*}=\left({G}_{jm}+{G}_{km}\right)-{G}_{jk},j\ne k\ne m\\ {\pi}_{j}={\alpha}_{j}^{*}={G}_{jk}-{G}_{km},j=1,\dots ,m-1;k\ne j,m\\ {\eta}_{j}=2{\alpha}_{j}^{*}+{\delta}_{j}^{*}=\left({G}_{jj}-{G}_{jm}\right)+\left({G}_{jk}-{G}_{km}\right),j=1,\dots ,m-1;k\ne j,m\\ {\eta}_{m}={\delta}_{m}^{*}=\left({G}_{mm}-{G}_{jm}\right)-\left({G}_{km}-{G}_{jk}\right),j\ne k\ne m\end{array}$ |

Allele-count | $2{\alpha}_{m}^{*}+{\delta}_{m}^{*}=0$ | $\begin{array}{c}{{\pi}^{\prime}}_{0}={\mu}^{*}={G}_{mm}\\ {{\pi}^{\prime}}_{j}={\alpha}_{j}^{*}=\frac{\left({G}_{jm}-{G}_{mm}\right)+\left({G}_{jk}-{G}_{km}\right)}{2},j=1,\dots ,m-1;k\ne j,m\\ {{\pi}^{\prime}}_{m}=-\frac{{\delta}_{m}^{*}}{2}=-\frac{\left({G}_{mm}-{G}_{jm}\right)-\left({G}_{km}-{G}_{jk}\right)}{2},j\ne k\ne m\\ {{\eta}^{\prime}}_{j}=2{\alpha}_{j}^{*}+{\delta}_{j}^{*}={G}_{jj}-{G}_{mm},j=1,\dots ,m-1\end{array}$ |

*H*

_{0}:

*α*

_{ j }=

*δ*

_{ j }= 0 (

*j*<

*m*) in model (9) is equivalent to

*π*

_{ j }=

*η*

_{ j }= 0 in model (11), which implies ${\alpha}_{j}^{*}={\delta}_{j}^{*}=0$ in model (8) with restriction ${\alpha}_{m}^{*}=0$, or

*G*

_{ jk }=

*G*

_{ km }for any

*k*= 1, ...,

*m*. On the other hand, the null hypothesis of

*H*

_{0}:

*a*

_{ j }=

*d*

_{ j }= 0 (

*j*<

*m*) in model (10) is equivalent to ${\pi}_{j}^{\prime}={\eta}_{j}^{\prime}=0$ in model (12), which implies ${\alpha}_{j}^{*}={\delta}_{j}^{*}=0$ in model (8) with a restriction $2{\alpha}_{m}^{*}+{\delta}_{m}^{*}=0$, or

*G*

_{ jj }=

*G*

_{ mm }and

*G*

_{ jj }-

*G*

_{ jm }=

*G*

_{ jk }-

*G*

_{ km }for any

*k*≠

*m*. In general, the two null hypotheses of

*α*

_{ j }=

*δ*

_{ j }= 0 and

*a*

_{ j }=

*d*

_{ j }= 0 may not always be equivalent. For example, when

*m*= 3, similar to the three-allele models discussed in the previous section, we can show that the four model parameters and the expected genotypic values have the relationships as shown in Table 5, which is a special case of Table 4. We can see from Table 5 that

*α*

_{1}=

*δ*

_{1}= 0 is equivalent to

*π*

_{1}=

*η*

_{1}= 0 which implies

*G*

_{12}=

*G*

_{23}and

*G*

_{11}=

*G*

_{13}; while

*a*

_{1}=

*d*

_{1}= 0 is equivalent to ${\pi}_{1}^{\prime}={\eta}_{1}^{\prime}=0$ which implies

*G*

_{11}=

*G*

_{33}and

*G*

_{12}+

*G*

_{13}=

*G*

_{11}+

*G*

_{23}. So, depending on the underlying true setting of the expected genotypic values, the null hypotheses of

*α*

_{1}=

*δ*

_{1}= 0 in model (9) could be different from that of

*a*

_{1}=

*d*

_{1}= 0 in model (10).

Parameterization of one-locus models (9), (10), (11), (12) when *m* = 3.

Codings | Restrictions | Relationships |
---|---|---|

Allele | ${\alpha}_{3}^{*}=0$ | $\begin{array}{c}\mu ={G}_{13}+{G}_{23}-{G}_{12}\\ {\alpha}_{1}={G}_{12}-{G}_{23},{\alpha}_{2}={G}_{12}-{G}_{13}\\ {\delta}_{1}=\left({G}_{11}-{G}_{13}\right)-\left({G}_{12}-{G}_{23}\right)\\ {\delta}_{2}=\left({G}_{22}-{G}_{23}\right)-\left({G}_{12}-{G}_{13}\right)\\ {\delta}_{3}={G}_{33}+{G}_{12}-{G}_{13}-{G}_{23}\end{array}$ |

F | $2{\alpha}_{3}^{*}+{\delta}_{3}^{*}=0$ | $\begin{array}{c}\tau =\frac{{G}_{11}+{G}_{22}}{2}\\ {a}_{1}=\frac{{G}_{11}-{G}_{33}}{2},{a}_{2}=\frac{{G}_{22}-{G}_{33}}{2}\\ {d}_{1}=\frac{\left({G}_{12}+{G}_{13}\right)-\left({G}_{23}+{G}_{11}\right)}{2}\\ {d}_{2}=\frac{\left({G}_{12}+{G}_{23}\right)-\left({G}_{13}+{G}_{22}\right)}{2}\\ {d}_{3}=\frac{\left({G}_{13}+{G}_{23}\right)-\left({G}_{12}+{G}_{33}\right)}{2}\end{array}$ |

Allele-count | ${\alpha}_{3}^{*}=0$ | $\begin{array}{c}{\pi}_{0}={G}_{13}+{G}_{23}-{G}_{12}\\ {\pi}_{1}={G}_{12}-{G}_{23},{\pi}_{2}={G}_{12}-{G}_{13}\\ {\eta}_{1}=\left({G}_{11}-{G}_{13}\right)+\left({G}_{12}-{G}_{23}\right)\\ {\eta}_{2}=\left({G}_{22}-{G}_{23}\right)+\left({G}_{12}-{G}_{13}\right)\\ {\eta}_{3}={G}_{33}+{G}_{12}-{G}_{13}-{G}_{23}\end{array}$ |

Allele-count | $2{\alpha}_{3}^{*}+{\delta}_{3}^{*}=0$ | $\begin{array}{c}{\pi}_{0}^{\prime}={G}_{33}\\ {\pi}_{1}^{\prime}=\frac{\left({G}_{12}+{G}_{13}\right)-\left({G}_{23}+{G}_{33}\right)}{2}\\ {\pi}_{2}^{\prime}=\frac{\left({G}_{12}+{G}_{23}\right)-\left({G}_{13}+{G}_{33}\right)}{2}\\ {\pi}_{3}^{\prime}=\frac{\left({G}_{13}+{G}_{23}\right)-\left({G}_{12}+{G}_{33}\right)}{2}\\ {\eta}_{1}^{\prime}={G}_{11}-{G}_{33},\phantom{\rule{2.77695pt}{0ex}}{\eta}_{2}^{\prime}={G}_{22}-{G}_{33}\end{array}$ |

### Extension to two-locus models

*m*

_{1}

*m*

_{2}(

*m*

_{1}+ 1)(

*m*

_{2}+ 1)/4 possible distinctive expected genotypic values:

*G*

_{ jkrs }= E(

*G*|

*A*

_{1j}

*A*

_{1k}

*A*

_{2r}

*A*

_{2s}) for

*j*,

*k*= 1, ...,

*m*

_{1},

*j*≤

*k*; and

*r*,

*s*= 1, ...,

*m*

_{2},

*r*≤

*s*. Using the allele coding, we introduce the following coding variables

*j*,

*k*= 1, ...,

*m*

_{1}, for marker genotypes at locus 1 and

*r*,

*s*= 1, ...,

*m*

_{2}, for marker genotypes at locus 2, where ${A}_{1j}^{c}$ (or ${A}_{2r}^{c}$) denotes any other allele type except

*A*

_{1j}(or

*A*

_{2r}) at locus 1 (or 2). A fully parameterized two-locus model for

*G*

_{ jkrs }can then be presented as

for *i* = 1, ..., *N*. Similar to the one-locus models, we can establish the relationship between the model parameters and the expected genotypic values as shown in (C.1) of Appendix C. A nice property of this allele coding model is that a higher order effect is simply the deviation of its corresponding expected genotypic value from an approximation of the other lower order effects. Here the corresponding expected genotypic value of a marker effect is determined by the position of alleles that differ from the two reference alleles ${A}_{1{m}_{1}}$ and ${A}_{2{m}_{2}}$. So, starting from the lowest order parameter *μ*, it seems straightforward to build the relationships between the model parameters and the expected genotypic values starting from the low-order effect parameters up to the high-order effect parameters.

_{∞}coding, we can define the following coding variables for the genotypes at the two marker loci separately.

*j*= 1, ...,

*m*

_{1}, and

*r*= 1, ...,

*m*

_{2}. A fully parameterized two-locus model using this F

_{∞}coding is then

for *i* = 1, ..., *N*. Still, using the relationships *w*_{1j}= 1 + *f*_{1j}, *w*_{2r}= 1 + *f*_{2r}, *v*_{1jj}= (1 + *f*_{1j}- *h*_{1j}), *v*_{2rr}= (1 + *f*_{2r}- *h*_{2r}), *v*_{1jk}= *h*_{1j}*h*_{1k}for *j* ≠ *k*, and *v*_{2rs}= *h*_{2r}*h*_{2s}for *r* ≠ *s* between the F_{∞} coding variables and the allele coding variables, we can establish the relationships between the model parameters and the expected genotypic values as shown in (C.2) of Appendix C. We can easily verify that the biallelic two-locus effects ${E}_{{F}_{\infty}\cdot AB}$ in [9] is a special case of our results with *m*_{1} = *m*_{2} = 2. It is also interesting to see that the interpretation of model parameters in terms of the expected genotypic values becomes much more complicated than that in the previous allele coding model. When *m*_{1}, *m*_{2} > 2, the low-order within-locus main effect *a*_{1j}is a weighted combination of the differences $\left({G}_{jjrr}-{G}_{{m}_{1}{m}_{1}rr}\right)$, where *r* = 1, ..., *m*_{2} refer to various homozygous genotypes *A*_{2r}*A*_{2r}at locus 2. The within-locus effect *d*_{1jj}is a weighted combination of the allelic interactions $\left({G}_{jjrr}-2{G}_{j{m}_{1}rr}+{G}_{{m}_{1}{m}_{1}rr}\right)$, *r* = 1, ..., *m*_{2}, at locus 1 with reference *A*_{2r}*A*_{2r}at locus 2. Even the intercept τ of the model becomes a complex function of various homozygous genotypic values.

*j*= 1, ...,

*m*

_{1}, and

*r*= 1, ...,

*m*

_{2}. Another fully parameterized two-locus model for

*G*

_{ jkrs }can be written as

for *i* = 1, ..., *N*. In this case, the allele-count coding variables and the allele coding variables have the relationships ${w}_{1j}={h}_{1j}^{\left(1\right)}+2{h}_{2j}^{\left(1\right)}$, ${w}_{2r}={h}_{1r}^{\left(2\right)}+2{h}_{2r}^{\left(2\right)}$, ${v}_{1jj}={h}_{2j}^{\left(1\right)}$, ${v}_{2rr}={h}_{2r}^{\left(2\right)}$, ${v}_{1jk}={h}_{1j}^{\left(1\right)}{h}_{1k}^{\left(1\right)}$ for *j* ≠ *k*, and ${v}_{2rs}={h}_{1r}^{\left(2\right)}{h}_{1s}^{\left(2\right)}$ for *r* ≠ *s*. Through the equivalence of the two models (13) and (15), we can also construct relationships between the parameters in model (15) and the expected genotypic values as shown in (C.3) of Appendix C. We can see that the interpretation of parameters in the allele-count coding model (15) are as simple as that in the allele coding model (13) with the same intercept being ${G}_{{m}_{1}{m}_{1}{m}_{2}{m}_{2}}$. Besides, it seems that some parameters such as (*η*_{1jj}*η*_{2rr}), (*η*_{1jk}*η*_{2rs}) and (*η*_{1jk}*η*_{2rr}) have simpler relationships than the corresponding ones in the allele coding model (13).

*j*,

*k*= 1, ...,

*m*

_{1}and

*r*,

*s*= 1, ...,

*m*

_{2}. If we further ignore the within-locus allelic interactions between different alleles, then another reduced two-locus model framework is

*j*= 1, ...,

*m*

_{1}and ${\delta}_{2r}^{*}$ for

*r*= 1, ...,

*m*

_{2}in (15) will lead to an additive model framework, which has its model parameters interpretable similar to that in Table 6. From Tables 6 and 7, we can see that both the allele and allele-count coding models have their lower-order main effects keep similar interpretation as to that in the previous fully parameterized case with epistases, while the F

_{∞}coding models have the definition of their lower-order main effects vary depending on whether there are epistases involved in the models.

Parameterization of two-locus models under model framework (16).

Codings | Relationships |
---|---|

Allele | $\begin{array}{c}\mu ={G}_{{m}_{1}{m}_{1}{m}_{2}{m}_{2}}\\ {\alpha}_{1j}={G}_{j{m}_{1}{m}_{2}{m}_{2}}-{G}_{{m}_{1}{m}_{1}{m}_{2}{m}_{2}},j=1,\dots ,{m}_{1}-1\\ {\alpha}_{2r}={G}_{{m}_{1}{m}_{1}r{m}_{2}}-{G}_{{m}_{1}{m}_{1}{m}_{2}{m}_{2}},r=1,\dots ,{m}_{2}-1\\ {\delta}_{1jk}={G}_{jk{m}_{2}{m}_{2}}-{G}_{j{m}_{1}{m}_{2}{m}_{2}}-{G}_{k{m}_{1}{m}_{2}{m}_{2}}+{G}_{{m}_{1}{m}_{1}{m}_{2}{m}_{2}},j,k=1,\dots ,{m}_{1}-1;j\le k\\ {\delta}_{2rs}={G}_{{m}_{1}{m}_{1}rs}-{G}_{{m}_{1}{m}_{1}r{m}_{2}}-{G}_{{m}_{1}{m}_{1}s{m}_{2}}+{G}_{{m}_{1}{m}_{1}{m}_{2}{m}_{2}},r,s=1,\dots ,{m}_{2}-1;r\le s\end{array}$ |

F | $\begin{array}{c}\tau ={G}_{{m}_{1}{m}_{1}{m}_{2}{m}_{2}}+\frac{1}{2}{\sum}_{j=1}^{{m}_{1}-1}\left({G}_{jj{m}_{2}{m}_{2}}-{G}_{{m}_{1}{m}_{1}{m}_{2}{m}_{2}}\right)+\frac{1}{2}{\sum}_{r=1}^{{m}_{2}-1}\left({G}_{{m}_{1}{m}_{1}rr}-{G}_{{m}_{1}{m}_{1}{m}_{2}{m}_{2}}\right)\\ {a}_{1j}=\frac{{G}_{jj{m}_{2}{m}_{2}}-{G}_{{m}_{1}{m}_{1}{m}_{2}{m}_{2}}}{2},j=1,\dots ,{m}_{1}-1\\ {a}_{2r}=\frac{{G}_{{m}_{1}{m}_{1}rr}-}{}\end{array}$ |