Modelling dominance in a flexible intercross analysis
© Rönnegård et al. 2009
Received: 10 December 2008
Accepted: 28 June 2009
Published: 28 June 2009
Skip to main content
© Rönnegård et al. 2009
Received: 10 December 2008
Accepted: 28 June 2009
Published: 28 June 2009
The aim of this paper is to develop a flexible model for analysis of quantitative trait loci (QTL) in outbred line crosses, which includes both additive and dominance effects. Our flexible intercross analysis (FIA) model accounts for QTL that are not fixed within founder lines and is based on the variance component framework. Genome scans with FIA are performed using a score statistic, which does not require variance component estimation.
Simulations of a pedigree with 800 F 2 individuals showed that the power of FIA including both additive and dominance effects was almost 50% for a QTL with equal allele frequencies in both lines with complete dominance and a moderate effect, whereas the power of a traditional regression model was equal to the chosen significance value of 5%. The power of FIA without dominance effects included in the model was close to those obtained for FIA with dominance for all simulated cases except for QTL with overdominant effects. A genome-wide linkage analysis of experimental data from an F 2 intercross between Red Jungle Fowl and White Leghorn was performed with both additive and dominance effects included in FIA. The score values for chicken body weight at 200 days of age were similar to those obtained in FIA analysis without dominance.
We have extended FIA to include QTL dominance effects. The power of FIA was superior, or similar, to standard regression methods for QTL effects with dominance. The difference in power for FIA with or without dominance is expected to be small as long as the QTL effects are not overdominant. We suggest that FIA with only additive effects should be the standard model to be used, especially since it is more computationally efficient.
Large genetic differences between founder breeds are utilized in experimental crosses of outbred lines, which gives a high power of detecting quantitative trait loci (QTL) even for moderately sized pedigrees. The commonly used regression model to detect QTL assumes a biallelic QTL fixed within each of the two founder lines . Most traits have a substantial within-breed heritability and we may therefore expect that some QTL are not fixed. If the QTL is not fixed within founder lines, the regression model will underestimate the QTL effect and the power to detect the QTL decreases . In an earlier paper  we developed a flexible intercross analysis (FIA) to enhance the detection of QTL in experimental crosses of outbred lines. FIA is a variance component based model which is able to detect QTL at different degrees of fixation within founder lines. Genome scans are performed based on a score statistic in FIA, which gives a computationally efficient and statistically powerful method since it does not require estimation of variance components. The model is also flexible because it can be applied on advanced intercross lines with an arbitrary number of generations. We have shown that the power of FIA is similar to Haley-Knott (HK) regression  for fixed QTL and FIA is superior to HK-regressions for QTL that are not fixed within founder lines. We also showed that the differences between FIA and HK-regression is larger for pedigrees with small base generations than for pedigrees with large ones. However, the model was developed and tested for additive QTL only.
Other methods have previously been developed to account for within-line QTL variation. Most of these methods do not include dominance effects (e.g. ). Two exceptions are Knott et al.  and Pérez-Enciso et al. . Knott et al.  developed a nested within half-sib family model that does not assume fixation of QTL alleles in the founder lines, and the number of alleles is only constrained by the number of families. This model was further developed by Kim et al.  for analysis of F 2 intercrosses and includes both line effects and half-sib family effects. Dominance is estimated in the line effect whereas the family effect is an overall allele substitution effect. This is a model specifically designed for F 2 intercrosses with fixed effects only and the number of estimated parameters increases with the number of half-sib families. Furthermore, the genotypic information of the dams is not included in the model and the sires are assumed to be unrelated. Pérez-Enciso and Varona  developed a mixed QTL model that accounts for line differences and within-line variation of QTL effects. In this model, which is similar to the model developed by Wang et al. , a fixed line effect is estimated together with a random within-line QTL variance. This model was further extended to include dominance in Pérez-Enciso et al. . A drawback of the model is, however, the difficulty to compare estimates in different genomic locations as the total QTL variance is a combination of fixed and random effects. The method is also slow since it utilizes a derivative-free method to maximize the log-likelihood in each tested chromosome position. There is therefore a need to develop a method which is computationally efficient, includes dominance and can be applied on general pedigrees from line crosses. We may expect major genes to have considerable dominance effects  but this does not necessarily imply that the power of a QTL analysis will increase by including dominance effects in the statistical model. In a recent paper by Martinez , the power to detect a QTL having a dominance effect using a variance component (VC) model was studied. He found that the gain in power using a model with both additive and dominance effects was not substantial compared to a model with only additive effects as long as the QTL effect was not overdominant. In the simulation study performed by Martinez, non-inbred full-sib families were simulated and all founder QTL allele effects were assumed to be independent. FIA is a variance component based method which models dependencies between founder QTL allele effects. This difference between FIA and the model studied by Martinez  implies that Martinez' results cannot be directly applied on FIA.
The aim of this paper is to extend the FIA model to include both additive and dominance effects, where this extended version is computationally efficient and possible to apply on general pedigrees from line crosses. This version of FIA is then used to test the importance of including dominance in terms of power for QTL detection. We compare the power of the model, by means of simulations, with the original version of FIA and HK-regression. The model is also applied on chicken body weight at 200 days of age in an F 2-cross between wild Red Jungle Fowl and domestic Leghorn. The HK-regression model was chosen for comparison in our simulations because the assumptions of the model are simple and also because it is extensively used in QTL analysis (e.g. [1, 10, 11]).
Simulated levels of fixation for the four simulated scenarios ranging from a fixed QTL (Case 1) to equal frequencies in both founder lines (Case 4)
Proportion A alleles
Proportion B alleles
Proportion A alleles
Proportion B alleles
Our simulations show that the power of FIA including dominance effects is substantially higher for overdominant QTL. For QTL effects that are not overdominant the differences between the two versions of FIA are small. Hence, it is feasible to include dominance in FIA. We expect, however, that major genes having moderate dominance effects will be detected with the simpler additive version of FIA. These results are similar to the ones obtained by Martinez  where he showed that the power of VC-based models does not increase substantially by including dominance effects as long as the QTL effects are not overdominant. The difference in power for HK-regression with or without dominance included in the model seem to be small as long as the QTL effects are not overdominant. So the importance of including dominance effects in QTL analysis seems to be a general question and is related to how often we can expect major genes to be overdominant.
Although the differences between HK-regression and FIA decreases for dominant QTL effects we still have not found a case where HK-regression outperforms FIA substantially in terms of QTL detection power. Regression methods are computationally faster than FIA although the latter is based on the score statistic which is easily computed. For the simulated pedigree with 800 F 2 individuals, including dominance in FIA gives a three-fold increase in computational costs (wall clock-time) for the score statistic (eq. 12).
Including dominance also requires that the dominance IBD-matrices have been computed, which may be computationally demanding unless the IBD calculations are based on the gametic IBDs (see eq. 3). The genome scan in FIA is based on a score statistic (eq. 12) and the variance components in FIA do not need to be estimated for each position, but for QTL positions we may wish to estimate the variance components of FIA. There are then two variance components for the additive effects, two for the dominance effects (see eq. 11) and one for the residual variance. Although the VC estimates are of secondary importance in FIA, estimates of the five variance components in eq. (11) are given in the Appendix for each of the four cases in Table 1, for 120 replicates of the simulated 800 F 2 pedigree. Models with several variance components require a robust REML estimation algorithm to ensure convergence. Mishchenko et al.  recently developed a robust and efficient REML estimation algorithm for VC models including up to five variance components, which was not applied in our current study but is likely to become useful in the future.
We have previously shown that it is computationally feasible to include epistasis in FIA  but so far we have not tested FIA with epistasis on empirical data, and we may expect HK-regression to be a useful method for detection of epistatic QTL effects (e.g. ) still for some time in the future. We are convinced that an important research task is to develop a computationally fast and robust version of FIA for detection of epistatic effects.
We have shown that FIA can be extended to include QTL dominance effects. The power of FIA is superior, or similar, to HK-regression for QTL effects with dominance. The difference in power for FIA with or without dominance is small as long as the QTL effects are not overdominant. Furthermore, we expect that FIA with only additive effects included will be effective also for finding major genes having moderate dominance effects. We therefore suggest that FIA with only additive effects should be the model to use in most situations especially since it is computationally less intensive.
In this section we present the traditional single locus VC model that includes dominance effects of the QTL and where all base QTL allele effects are assumed to be uncorrelated [13, 14]. Thereafter, we present our FIA model which was previously developed for additive QTL effects  and show how dominance can be included.
where the values g ij (k, l) are the gametic IBDs between individual i and j for the maternal/paternal alleles k and l.
Hence, for a single QTL model there is no covariance between additive and dominance effects. The estimates of and may be strongly correlated, however, since the IBD-values in Π and Δ are correlated .
Here, Π I is the genotypic IBD-matrix assuming independent QTL allele effects in the base generation and Π J is the IBD-matrix that assumes fixation of QTL alleles within founder lines. Hence, the analysis using FIA requires an IBD estimation program that allows for different base generation structures. We used the same IBD-matrix estimation program as in , which is based on the deterministic algorithm published by .
Here, Δ I is the dominance IBD-matrix assuming independent QTL allele effects in the base generation and Δ J is the dominance IBD-matrix that assumes fixation of QTL alleles within founder lines. The above formula for the variance-covariance matrix V was derived following the derivation of eq. (4) in Rönnegård et al. .
We let the variance components be independent of each other. This assumption gives the variance-covariance matrix of y as a linear function of the variance components. This is a simplification since is the same within-line correlation as and the variance-covariance matrix of y is not strictly a linear function of the variance components.
where D is the gradient and F is the information matrix calculated under the null hypothesis of no QTL effects, i.e. .
The significance thresholds for the genome scan were calculated by means of permutation testing (as in ). Residuals were calculated from a null model assuming no QTL effect. These residuals were then permuted giving a new vector ĕ. Replicates of the phenotypic data were simulated with where is the vector of fixed effects estimated from the null model y = Xb + e. For each replicate, the score statistic was calculated at every tested position (5 cM apart) along the genome using 12. The empirical distribution of the maximum score value from each replicate was used to obtain significance thresholds. 2000 replicates were simulated.
In the power analyses, level of fixation within founder lines and degree of dominance were varied to evaluate the differences between FIA and HK-regression. The methods were compared by their power to detect a QTL at a given position at a 5% significance level.
The structure for the base generation was designed to mimic the pedigree of a Red Jungle Fowl – White Leghorn F 2 Cross  with one Jungle Fowl male mated to three Leghorn females, and 800 F 2 individuals. Four different cases (Table 1) were studied by varying the fixation level within lines for a biallelic QTL. The QTL was simulated at a position having a fully-informative marker so that the QTL alleles could be traced through the pedigree unambiguously.
The phenotype of an F 2 individual i was simulated with y i = A 1i + A 2i + D i + e i where A 1i is the QTL allele effect on the paternally inherited chromosome and A 2i is the QTL allele effect on the maternally inherited chromosome, D i is the dominance effect and e i is an iid normally distributed residual effect with a variance equal to 98. A biallelic QTL was simulated where the additive effects for the two alternative alleles were 0 and a, and the dominance effects for heterozygotes was d. The values of a and d were varied from 0 to 2.
6000 replicates were calculated for each of the four cases in Table 1 and for varying degrees of dominance.
In a Red Jungle Fowl × White Leghorn F2 cross, we performed a full genome scan using FIA with additive and dominance effects. In this pedigree, one Red Jungle Fowl male was mated to three White Leghorn females producing 756 F 2 offspring with measured genotypes and phenotypes. We used an updated marker map to those reported in  including 439 markers (Leif Andersson, personal communication) covering chromosomes 1 to 28. We analyzed body weight at 200 days of age. In our previous study using FIA with only additive effects we found six QTL at a 5% genomwide significance. These QTL were located at: 102 cM on chromosome 1, 488 cM on chromosome 1, 32 cM on chromosome 5, 30 cM on chromosome 6, 21 cM on chromosome 27 and 35 cM on chromosome 28. The data are described in detail in .
For simulations under Case 1, the additive variance and the covariance within lines were similar, and the dominance variance was close to the dominance covariance within lines [see Additional File 1]. These results were expected since the correlation within lines is 1.0 in Case 1. Furthermore, the relative difference between the estimated variances and covariances increased when the simulated within-line correlation decreased from 1.0 in Case 1 to 0 in Case 4.
The theoretical expectation of the estimated and for fixed values of a and d depends on the level of fixation within lines (see Appendix in Rönnegård et al. ). For a given case in Table A1 we can see, however, that the estimated QTL variances decreases as the simulated QTL effects decreases. For a = 0 or d = 0 we do not get QTL variance estimates close to zero, which suggests that there is a bias in the estimates. This bias is likely due to the fact that the elements in the IBD matrices Π and Δ are correlated, and that it is therefore difficult to separate the additive and dominance effects in the REML estimation. In the applied Fisher scoring algorithm, each variance component was restricted to be greater or equal to 0.1 to ensure positive variance estimates. If the algorithm had not converged within 20 iterations the result was not analyzed and reported as non-converged. There are five variance components in eq. (10) and there were a substantial number of simulations (around 15%) that did not converge. The difficulties in convergence is not a major problem in FIA, however, since the genome scan is based on a score statistic that does not require VC estimation. REML estimation for models with several variance components is a general computational problem and a robust method is described in Mishchenko et al. .
LR and FB gratefully acknowledge FORMAS in financing this study, and ÖC acknowledges SSF for financial support.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.