Bayesian shrinkage mapping of quantitative trait loci in variance component models
© Fang; licensee BioMed Central Ltd. 2010
Received: 12 May 2009
Accepted: 29 April 2010
Published: 29 April 2010
In this article, I propose a model-selection-free method to map multiple quantitative trait loci (QTL) in variance component model, which is useful in outbred populations. The new method can estimate the variance of zero-effect QTL infinitely to zero, but nearly unbiased for non-zero-effect QTL. It is analogous to Xu's Bayesian shrinkage estimation method, but his method is based on allelic substitution model, while the new method is based on the variance component models.
Extensive simulation experiments were conducted to investigate the performance of the proposed method. The results showed that the proposed method was efficient in mapping multiple QTL simultaneously, and moreover it was more competitive than the reversible jump MCMC (RJMCMC) method and may even out-perform it.
The newly developed Bayesian shrinkage method is very efficient and powerful for mapping multiple QTL in outbred populations.
There are two kinds of models which can be used to map QTL in outbred populations, the allelic substitution model [1–3] and the variance component model [4–7]. In the allelic substitution model, the number of QTL alleles is assumed to be known, and the QTL allelic substitution is estimated by the given linkage phases of parents, which can be inferred from genotypes of family members. The least square  and maximum likelihood [1, 3, 8] of interval mapping are two popular statistical approaches for such models. Compared with the allelic substitute model, the variance component model is more robust because it can handle an arbitrary number of alleles with arbitrary modes of gene actions. Moreover, the linkage phase of parents is unnecessary, which is nice since it is hard to accurately infer, particularly when family size is small, such as with human populations. Therefore, the variance component model is usually used to map QTL in outbred populations [4–7, 9–12]. In the variance component model, the identity-based-decent (IBD) matrix may be different for each locus and can provide information to localize the QTL. The least square method [10, 11] and the maximum likelihood method [4, 13] are also two important statistical methods for handling this model.
Because of the polygenic nature of quantitative traits, multiple QTL mapping is a problem of model selection. The least square method and the maximum likelihood method can nicely handle single QTL model, but is difficult for them to handle multiple QTL model. Recently, the Bayesian reversible jump MCMC (RJMCMC) method has been used to map multiple QTL in the variance component model [9, 12]. However, it still has some disadvantages. Because the model dimension is variable, it usually has poor mixing character and is difficult to converge [14–16]; moreover, it is also difficult to explore all the model space, especially in genome-wide mapping where thousands of possible locus are scanned .
Therefore, in this article I proposed a model-dimension-fixed method, in which the estimate of variance is very precise for nonzero-effect QTL, and gradually converges to zero for zero-effect QTL. Therefore, special model selection is needless. It is similar to the recent Bayesian shrinkage estimation methods [15, 17–19], which are based on the allelic substitution model, whereas my method is based on the variance component model. The efficiency of the new method is demonstrated by a series of simulation experiments.
Similarly, the term of polygenic variance A should also be excluded from equation (2) in genome-wide mapping.
Prior specification and joint posterior distribution
Yi and Xu  assigned a uniform prior distribution for QTL variance, , but in my method, the Jeffreys' hyper prior is assumed. The special prior is the key in the new method and will be illustrated in detail later. The prior for polygenic variance and residual variance is assumed to follow scaled inverted chi-square distribution with degree of freedom ω and scaled parameter s2 (see also  for detail); and the prior for covariant effect and QTL position λ j are assumed to follow normal distribution, β ~ N (β0, V0), and uniform distribution, respectively. The joint posterior distribution is given in Appendix.
Updating QTL variance by random walk Metropolis-Hastings algorithm
Because there is no close form for the posterior distribution of QTL variance , the Metropolis-Hastings algorithm [22, 23] is used to simulate it. I firstly propose a new QTL variance and then accept it according to its acceptance probability.
Generating the new proposal QTL variance
I employ the Browne's method , a special random walk Metropolis-Hastings algorithm (RWM-H) to update QTL variance. Firstly a new QTL variance is proposed and sampled from a scaled inverted chi-squared distribution, conditional on the current value of QTL variance , , with the degree of freedom ν and the scaled parameter that equals the expectation of the current value , i.e. , and then the new QTL variance is accepted according to its accept probability. Since the new generated value closely relies on the old one, this approach is a special case of RWM-H, and the degree of freedom ν is equivalent to the tuning parameter [21, 24].
Calculating the acceptance probability
Initialize all parameters from legal values;
Update the covariate effect β;
The covariate effect β is updated by the efficient Gibbs sampler; the updating approach of and are similar to that of (see also  for detail); the updating of the QTL position is illustrated in Appendix.
To summarize the posterior probability, I divide the genome into bins with interval of 1 cM and calculate weighted QTL variance for each bin. The weighted QTL variance is defined as the estimate of the QTL variance multiplied by its posterior probability at each bin (see also ), which is the modification of the weighted QTL effect . If the profile of the weighted QTL variance generates a notable bump on the genome, the QTL is claimed as detected [15, 19, 25].
I simulated 500 independent full-sib families with 6 individuals in each one, and therefore 3,000 individuals were investigated in my study. The parents of the full-sib families were randomly sampled from a large outbred populations in Hardy-Weinberg and linkage equilibrium. One chromosome with the length of 100 cM was simulated, and 11 evenly spaced markers covered the chromosomes with an average marker interval of 10 cM. I assigned 6 alleles for each marker and infinite alleles for each QTL. Three QTL were simulated on the genome positioned at 15 cM, 45 cM and 75 cM, respectively. The additive variances of the three QTL were respectively 0.5, 1.2 and 0.8, and the dominant variances were assumed absent. The residual effect for each offspring was randomly sampled from normal distribution with mean 0 and variance = 1.0. The polygenic variance = 1.0. The simulation method for the polygenic effect has been illustrated in . The population mean equaled to 0. The phenotypic value for each sib was the sum of population mean, QTL effects, polygenic effect and residual effect. Therefore, the heritabilities explained by the three QTL were respectively 11.1%, 26.7% and 17.8%.
Before performing the simulated experiments, I firstly gave a default setup. The excepted QTL number q0 = 2, which may lead to the maximum QTL number, (see  for detail); the degree of freedom for generating the proposal variance ν = ν A = ν e = 10; the hyper-parameter ω A = ω e = 3 and (are approximately estimated as the phenotypic variance), and hence (equal to the expectation of their variance estimations, see ). Because ω A and ω e took a small value, the values of and would have ignorable effect on their estimates. The importance of the hyper-parameter also has been illustrated in . The MCMC ran for 21,000 rounds and the data was saved with every 10 rounds after the first 1,000 MCMC was discarded, so that there were 2,000 (20,000/10) posterior samples for posterior analysis.
Performance on simulated data with zero-QTL model
Investigation into the performance of the special RWM-H algorithm
The estimates of the QTL parameters and their standard deviations obtained from the proposed method under different levels of degree of freedom.
Comparison with the regression method
Comparison with the RJMCMC method with repeat experiments
To compare the proposed method with the RJMCMC based method , I simulated 30 sets of data. For the RJMCMC method, the maximum QTL number and the expected QTL number were also the same as the default setup; the prior distribution of the QTL variance, polygenic variance and residual variance followed uniform distribution with endpoint being zero and phenotypic variance; the thinning interval was empirically set as 10; the burn-in period was 1,000 and the length of the complete chain was 201,000, and hence, there were 10,000 samples saved for posterior analysis. It took ~ 5 hr for the new and RJMCMC method on a Pentium IV PC with a 2.60-GHz processor and 1.00 GB RAM.
The average estimates of the QTL parameters from 30 replicated experiments in the proposed and RJMCMC method.
Application in genome-wide mapping
QTL parameters and their estimates obtained from the two methods in the genome-wide mapping.
Comparison with RJMCMC
The simulated data was analyzed with the proposed method and the RJMCMC method. In the proposed method, the excepted QTL number q0 = 3, which results the maximum QTL number q = 8; the degree of freedom generating the proposal variance ν = ν A = ν e = 10; the hyper-parameter ω A = ω e = 3 and for the polygenic variance and the residual variance. The MCMC ran for 51,000 rounds and the data was saved with every 10 rounds after the first 1,000 MCMC was discarded, so that there were 5,000 (50,000/10) samples for posterior analysis. In the RJMCMC method, also q0 = 3 and q m = 8, thinning interval 10 and burn-in length 1,000, but the length of the complete chain was 201,000. In the genome-wide mapping, for all the QTL that affect the trait are included in the model, the polygenic variance is excluded from the model and thus needn't be estimated.
Test on the sensitive of the maximum QTL number
The Jeffreys' hyper prior for QTL variance, , which is much crucial in Bayesian shrinkage analysis for inbred line crosses [15, 17, 18], is also the key to the new method. Although the two methods handle different statistical models, the behavior is much similar, and they all need not special model selection. However, the vague prior may cause an improper posterior [30, 31]. Ter Braak et al.  proposed to use the prior with a small value of δ but not the extreme value 0 to avoid generating improper posterior, while this extreme value was just used in  and my researches. I also attempted to set δ = 0.0001 and other values and did several times of experiments, and the results at δ = 0.0001 were essentially the same as those at δ = 0 (the results were not shown).
Because there are no close forms for the variance components, the M-H algorithm is always used to update them. There are two kinds of M-H algorithm, and one is the RWM-H, in which the new proposal value is conditionally sampled on the old one; the other is called independent Metropolis-Hastings algorithm (IM-H), and the proposed value is independently sampled on the old one. Generally, the RWM-H is more efficient than the IM-H, because in the RWM-H the proposal values may automatically reach their main support region in the iterations. Another advantage of the RWM-H is that the estimate of QTL variance for zero-effect QTL may be gradually converged to zero. However, the special shrinkage character is hardly held by the IM-H algorithm because it is usually low probability that the values of the proposal variance close to zero infinitely. Hence in this article I use a special RWM-H to update QTL variance, which is also another key to my method.
The size of the tuning parameter may influence the efficiency of the RWM-H algorithm. In this method, the degree of freedom ν is equivalent to the tuning parameter, if ν is smaller, the efficiency of the M-H algorithm may decrease due to the low acceptance rate. But if ν is larger, although the acceptance rate increase, it is more difficult for the proposal variance components to explore their posterior distribution. If ν > 200, the chain will be stuck locally and the posterior distribution of variance components will be very difficult to explore, which explains why the shrinkage character is hardly hold by the proposed method when ν >200. In fact, the tuning parameter should be set appropriately, which makes the acceptance rate to be 10~ 40% . Therefore, in my method, the optimal tuning parameter ν should range from 3 to 15 from Figure 4.
I assign a scaled inverted chi-square distribution for polygenic variance and residual variance, which makes the incorporation of the prior information possible, and this has been studied in previous work . Certainly, other priors also can be used and then the formula of acceptance probability should be constructed appropriately.
The polygenic term is excluded in our genome-wide mapping because many QTL with relative large effect are investigated in my simulated study. In practice, the trait may be affected by few QTL with substantial effects and many QTL with minor effects, and then it is necessary to include the polygenic effect in the model.
I proposed a basic method for mapping QTL in variance component models. The method is also important in fine mapping [34–38] in which both linkage information and linkage disequilibrium (LD) information are utilized. If the markers are densely distributed, fine mapping provides an extremely powerful way for QTL mapping. The new method is also convenient to be modified to the simultaneous fine mapping of multiple QTL as long as the IBD matrix is appropriately constructed. Moreover, the method also can be extended to more complicated situation, such as that involving QTL dominant effect and epistatic effect.
I employed the approach of Yi et al.  to ascertain the maximum QTL number and found that the method was not very sensitive to the maximum QTL number. Theoretically, the maximum QTL number may be set as any value as long as it is greater than the actual QTL number. The simplest method is to assume that each marker interval contain one QTL, while it increases the computational burden. The appropriate selection of the maximum QTL number will contribute to saving the computational time.
The new developed method is very simple and easy to implement. The computer program is written in FORTRAN language, and it is also compiled into my software "BayesMapQTL.exe" which can be used to analyze simulated data, as well as field data with variable family size. Both program and software are available for request.
In this research, I proposed a Bayesian shrinkage estimation method for mapping multiple QTL. Different from Xu's (2003) shrinkage method that discriminately estimates QTL substitute effect, my method can shrinkage estimate QTL variance so that it need no special model selection. Simulation experiments show that the proposed method is efficient in simultaneously mapping multiple QTL in outbred populations and may even out-perform the RJMCMC method.
Joint posterior distribution
Update QTL position
The proposal ratio or Hastings ratio hr p = 1 due to the symmetric uniform proposal.
I am grateful to two anonymous reviewers for their helpful comments which have greatly improved the presentation of the manuscript. Lijun Pu, Weixuan Fu and one of reviewers helped proofread the language. I also thank BMC editorial for the waiver of the publication cost. This research was supported by Chinese Heilongjiang Education Ministry grant 11541254 to M. F.
- Knott SA, Haley CS: Maximum likelihood mapping of quantitative trait loci using full-sib families. Genetics. 1992, 132 (4): 1211-1222.PubMed CentralPubMedGoogle Scholar
- Haley CS, Knott SA, Elsen JM: Mapping quantitative trait loci in crosses between outbred lines using least squares. Genetics. 1994, 136 (3): 1195-1207.PubMed CentralPubMedGoogle Scholar
- Knott SA, Haley CS: Methods for multiple-marker mapping of quantitative trait loci in half-sib populations. Theor Appl Genet. 1996, 93: 71-80. 10.1007/BF00225729.View ArticlePubMedGoogle Scholar
- Goldgar DE: Multipoint analysis of human quantitative genetic variation. Am J Hum Genet. 1990, 47 (6): 957-967.PubMed CentralPubMedGoogle Scholar
- Amos CI: Robust variance-components approach for assessing genetic linkage in pedigrees. Am J Hum Genet. 1994, 54 (3): 535-543.PubMed CentralPubMedGoogle Scholar
- Xu S, Atchley WR: A random model approach to interval mapping of quantitative trait loci. Genetics. 1995, 141 (3): 1189-1197.PubMed CentralPubMedGoogle Scholar
- Almasy L, Blangero J: Multipoint quantitative-trait linkage analysis in general pedigrees. Am J Hum Genet. 1998, 62 (5): 1198-1211. 10.1086/301844.PubMed CentralView ArticlePubMedGoogle Scholar
- Liu Y, Jansen GB, Lin CY: Quantitative trait loci mapping for dairy cattle production traits using a maximum likelihood method. J Dairy Sci. 2004, 87 (2): 491-500. 10.3168/jds.S0022-0302(04)73188-4.View ArticlePubMedGoogle Scholar
- Yi N, Xu S: Bayesian mapping of quantitative trait loci under the identity-by-descent-based variance component model. Genetics. 2000, 156 (1): 411-422.PubMed CentralPubMedGoogle Scholar
- Haseman JK, Elston RC: The investigation of linkage between a quantitative trait and a marker locus. Behav Genet. 1972, 2 (1): 3-19. 10.1007/BF01066731.View ArticlePubMedGoogle Scholar
- Fulker DW, Cardon LR: A sib-pair approach to interval mapping of quantitative trait loci. Am J Hum Genet. 1994, 54 (6): 1092-1103.PubMed CentralPubMedGoogle Scholar
- Liu J, Liu Y, Liu X, Deng HW: Bayesian mapping of quantitative trait loci for multiple complex traits with the use of variance components. Am J Hum Genet. 2007, 81 (2): 304-320. 10.1086/519495.PubMed CentralView ArticlePubMedGoogle Scholar
- Schork NJ: Extended multipoint identity-by-descent analysis of human quantitative traits: efficiency, power, and modeling considerations. Am J Hum Genet. 1993, 53 (6): 1306-1319.PubMed CentralPubMedGoogle Scholar
- Yi N, George V, Allison DB: Stochastic search variable selection for identifying multiple quantitative trait loci. Genetics. 2003, 164 (3): 1129-1138.PubMed CentralPubMedGoogle Scholar
- Wang H, Zhang YM, Li X, Masinde GL, Mohan S, Baylink DJ, Xu S: Bayesian shrinkage estimation of quantitative trait loci parameters. Genetics. 2005, 170 (1): 465-480. 10.1534/genetics.104.039354.PubMed CentralView ArticlePubMedGoogle Scholar
- Banerjee S, Yandell BS, Yi N: Bayesian quantitative trait loci mapping for multiple traits. Genetics. 2008, 179 (4): 2275-2289. 10.1534/genetics.108.088427.PubMed CentralView ArticlePubMedGoogle Scholar
- Xu S: Estimating polygenic effects using markers of the entire genome. Genetics. 2003, 163 (2): 789-801.PubMed CentralPubMedGoogle Scholar
- Xu S: Derivation of the shrinkage estimates of quantitative trait locus effects. Genetics. 2007, 177 (2): 1255-1258. 10.1534/genetics.107.077487.PubMed CentralView ArticlePubMedGoogle Scholar
- Yang R, Xu S: Bayesian shrinkage analysis of quantitative trait Loci for dynamic traits. Genetics. 2007, 176 (2): 1169-1185. 10.1534/genetics.106.064279.PubMed CentralView ArticlePubMedGoogle Scholar
- Xu S, Gessler DD: Multipoint genetic mapping of quantitative trait loci using a variable number of sibs per family. Genet Res. 1998, 71 (1): 73-83. 10.1017/S0016672398003115.View ArticlePubMedGoogle Scholar
- Fang M, Liu S, Jiang D: Bayesian composite model space approach for mapping quantitative trait Loci in variance component model. Behav Genet. 2009, 39 (3): 337-346. 10.1007/s10519-009-9259-y.View ArticlePubMedGoogle Scholar
- Metropolis N, A RW, Rosenbluth MN, Teller AH, Teller E: Equation of state calculations by fast computing machines. J Chem Phys. 1953, 21: 1087-1091. 10.1063/1.1699114.View ArticleGoogle Scholar
- Hastings WK: Monte Carlo sampling methods using Markov chains and their applications. Biometrika. 1970, 57: 97-109. 10.1093/biomet/57.1.97.View ArticleGoogle Scholar
- Browne WJ: Applying MCMC Methods to Multi-level Models. 1998, Bath: University of BathGoogle Scholar
- Xu C, Wang X, Li Z, Xu S: Mapping QTL for multiple traits using Bayesian statistics. Genet Res. 2009, 91 (1): 23-37. 10.1017/S0016672308009956.View ArticleGoogle Scholar
- Gessler DD, Xu S: Using the expectation or the distribution of the identity by descent for mapping quantitative trait loci under the random model. Am J Hum Genet. 1996, 59 (6): 1382-1390.PubMed CentralPubMedGoogle Scholar
- Yi N, Yandell BS, Churchill GA, Allison DB, Eisen EJ, Pomp D: Bayesian model selection for genome-wide epistatic quantitative trait loci analysis. Genetics. 2005, 170 (3): 1333-1344. 10.1534/genetics.104.040386.PubMed CentralView ArticlePubMedGoogle Scholar
- Sillanpaa MJ, Arjas E: Bayesian mapping of multiple quantitative trait loci from incomplete inbred line cross data. Genetics. 1998, 148 (3): 1373-1388.PubMed CentralPubMedGoogle Scholar
- Seaton G, Haley CS, Knott SA, Kearsey M, Visscher PM: QTL Express: mapping quantitative trait loci in simple and complex pedigrees. Bioinformatics. 2002, 18 (2): 339-340. 10.1093/bioinformatics/18.2.339.View ArticlePubMedGoogle Scholar
- Hobert JP, Casella G: The effect of improper priors on Gibbs sampling in hierarchical linear mixed models. J Am Stat Assoc. 1996, 91 (1461-1473):
- Gelman A, Carlin J, Stern H, Rubin D: Bayesian data analysis. 2004, London: Chapman and all/CRCGoogle Scholar
- ter Braak CJ, Boer MP, Bink MC: Extending Xu's Bayesian model for estimating polygenic effects using markers of the entire genome. Genetics. 2005, 170 (3): 1435-1438. 10.1534/genetics.105.040469.PubMed CentralView ArticlePubMedGoogle Scholar
- Roberts GO, Rosenthal JS: Optimal scaling for various Metropolis-Hastings algorithms. Statist Sci. 2001, 16 (4): 351-367. 10.1214/ss/1015346320.View ArticleGoogle Scholar
- Meuwissen TH, Goddard ME: Fine mapping of quantitative trait loci using linkage disequilibria with closely linked marker loci. Genetics. 2000, 155 (1): 421-430.PubMed CentralPubMedGoogle Scholar
- Meuwissen TH, Karlsen A, Lien S, Olsaker I, Goddard ME: Fine mapping of a quantitative trait locus for twinning rate using combined linkage and linkage disequilibrium mapping. Genetics. 2002, 161 (1): 373-379.PubMed CentralPubMedGoogle Scholar
- Meuwissen TH, Goddard ME: Prediction of identity by descent probabilities from marker-haplotypes. Genet Sel Evol. 2001, 33 (6): 605-634. 10.1186/1297-9686-33-6-605.PubMed CentralView ArticlePubMedGoogle Scholar
- Meuwissen TH, Goddard ME: Mapping multiple QTL using linkage disequilibrium and linkage analysis information and multitrait data. Genet Sel Evol. 2004, 36 (3): 261-279. 10.1186/1297-9686-36-3-261.PubMed CentralView ArticlePubMedGoogle Scholar
- Lee SH, Werf Van der JH: Simultaneous fine mapping of multiple closely linked quantitative trait Loci using combined linkage disequilibrium and linkage with a general pedigree. Genetics. 2006, 173 (4): 2329-2337. 10.1534/genetics.106.057653.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.