- Methodology article
- Open Access

# Generalized linear mixed model for segregation distortion analysis

- Haimao Zhan
^{1}and - Shizhong Xu
^{1}Email author

**12**:97

https://doi.org/10.1186/1471-2156-12-97

© Zhan and Xu; licensee BioMed Central Ltd. 2011

**Received:**23 September 2011**Accepted:**11 November 2011**Published:**11 November 2011

## Abstract

### Background

Segregation distortion is a phenomenon that the observed genotypic frequencies of a locus fall outside the expected Mendelian segregation ratio. The main cause of segregation distortion is viability selection on linked marker loci. These viability selection loci can be mapped using genome-wide marker information.

### Results

We developed a generalized linear mixed model (GLMM) under the liability model to jointly map all viability selection loci of the genome. Using a hierarchical generalized linear mixed model, we can handle the number of loci several times larger than the sample size. We used a dataset from an F_{2} mouse family derived from the cross of two inbred lines to test the model and detected a major segregation distortion locus contributing 75% of the variance of the underlying liability. Replicated simulation experiments confirm that the power of viability locus detection is high and the false positive rate is low.

### Conclusions

Not only can the method be used to detect segregation distortion loci, but also used for mapping quantitative trait loci of disease traits using case only data in humans and selected populations in plants and animals.

## Keywords

- Generalize Linear Mixed Model
- Segregation Distortion
- Dominance Effect
- Unselected Population
- Viability Selection

## Background

Segregation distortion refers to a phenomenon that the observed genotypic frequencies deviate significantly from the expected Mendelian frequencies [1]. Different populations have different Mendelian ratios, e.g., the typical Mendelian ratio for an F_{2} population is 1:2:1 for the three genotypes *A*_{1}*A*_{1}: *A*_{1}*A*_{2}: *A*_{2}*A*_{2}. Many reasons can explain the observed distortion [2–7]. The most promising explanation is viability selection on the distorted markers or loci linked to the markers [8]. In genetic mapping for quantitative traits, the basic assumption is Mendelian segregation [9]. Therefore, distorted markers are usually discarded prior to QTL mapping because people usually fear unexpected consequences of distorted markers on the results. In a recent study [10], we found that segregation distortion is not necessarily harmful to QTL mapping; rather, it can help in some circumstances. Consequently, we can incorporate segregation distortion into existing QTL mapping programs [11].

It appears that segregation distortion is common rather than rare. If segregation distortion is indeed caused by viability selection loci, these loci themselves are of interest because they may help to understand the mechanism of natural selection and evolution. Chi-square tests are commonly used to test segregation distortion. Fu and Ritland [12] and Lorieux et al. [13] developed maximum likelihood methods to map segregation distortion loci. The methods are interval mapping approaches in which one distortion locus is tested at a time. Vogl and Xu [14] used an MCMC implemented Bayesian algorithm to detect multiple segregation loci simultaneously. These methods are quite different from the usual QTL mapping procedures in quantitative trait genetic mapping. Luo and Xu [15] first developed an expectation and maximization (EM) algorithm for mapping viability selection loci. This method takes advantage of the well known EM algorithm in interval mapping. Recently, Luo et al. [16] developed a quantitative genetic model to map viability loci. The authors postulated a hidden underlying liability for each individual. The liability is an unobserved quantitative trait and natural selection acts on the liability. The method of Luo et al. [16] actually maps loci controlling the hidden liability (a quantitative trait). Therefore, methods of QTL mapping and viability locus mapping have been unified into the same framework of interval mapping. Both methods are called QTL mapping, but the traits mapped are different, the former maps observed quantitative traits and the latter maps unobserved liability.

The quantitative genetic model of Luo et al. [16] is an interval mapping approach. The state-of-the-art QTL mapping procedure is the Bayesian shrinkage method [17–19] because it simultaneously evaluates the entire genome. It is natural to extend the Bayeisan shrinkage method to map multiple viability loci. The Markov chain Monte Carlo (MCMC) algorithm is commonly used to implement the Bayesian method. Such a sampling based method is time consuming. A fast version of the Bayesian method is the empirical Bayesian method [20] where the variance components in the prior distributions of QTL effects are first estimated from the data and then used as the priors to estimate the QTL effects under the general Bayesian framework. This method is essentially the linear mixed model approach. When applied to discrete traits, the method is called the generalized linear mixed model [21, 22].

Numerous algorithms have been developed to implement the generalized linear mixed model. The pseudo likelihood algorithm [23–25] appears to be the most popular one. The method requires a normal transformation of the original data point using the first step Newton-Raphson update. Once the data points are normally transformed, they are treated as normal quantitative phenotypes. The usual linear mixed model applies to the transformed data points. The difference between the Newton-Raphson transformation and the data transformation commonly seen in data analysis is that the Newton-Raphson transformation is a function of the data point and parameters while the usual data transformation is a function of the data point only. Therefore, the Newton-Raphson transformation is required for each cycle of the iteration process.

It is not clear how to use the pseudo likelihood approach to mapping viability loci because there is no phenotypic data point to transform. However, the method of McGilchrist [26] for generalized linear mixed model can be applied here. This method only requires a linear predictor, a likelihood and a prior distribution for each effect in the linear predictor. In this study, we used the McGilchrist's [26] method to perform parameter estimation.

## Method

### Liability model and viability selection

*y*

_{ j }as the liability for individual

*j*,

*ε*

_{ j }~

*N*(0,1) is a residual error with a standardized normal distribution. Other model effects are defined as follows. There may be some effects not related to genetics, such as age, location and other systematic effects, and these effects are captured by

*β*and the design matrix

*X*. There are

*p*genetic loci each with an effect

*γ*

_{ k }for

*k*= 1, ...,

*p*. The value of

*Z*

_{ jk }is determined by the genotype of individual

*j*at locus

*k*. For example, an F

_{2}individual derived from the cross of two inbred lines can take one of three genotypes,

*A*

_{1}

*A*

_{1},

*A*

_{1}

*A*

_{2}and

*A*

_{2}

*A*

_{2}. Under the additive genetic model,

*Z*

_{ jk }is defined as

*γ*

_{ k }=

*a*

_{ k }is the additive genetic effect for locus

*k*. Under the dominance effect model, the genetic effect for locus

*k*is a 2 × 1 vector

*γ*

_{ k }= [

*a*

_{ k }

*d*

_{ k }]

^{ T }, where

*d*

_{ k }is called the dominance effect. The corresponding

*Z*variable is also a vector and defined as

*H*

_{ i }is the

*i*-th row of matrix

*H*, as shown below,

*y*

_{ j }is not observed but it determines the viability of individual

*j*. It is assumed that individual

*j*will survive if

*y*

_{ j }> 0 and die otherwise. Since we can only observe the surviving individuals, all individuals in the sample have liabilities greater than zero. This will cause the selected population to deviate from the expected Mendelian segregation ratio for loci responsible for viability selection and all loci linked to the viability loci. Although all individuals have survived, some may have a high liability and some may have a low liability, but all have a liability greater than zero. We now use the concept of penetrance to describe the survivability of an individual. Let

*j*, i.e., Φ(

*η*

_{ j }) or logistic(

*η*

_{ j }) = exp(

*η*

_{ j })/[1 + exp(

*η*

_{ j })]. Conditional on the genotypes of all other loci, the penetrances for the three genotypes of locus

*k*are defined as

*k*. This model was first introduced by Luo et al. (2005) for single locus analysis, which does not include

*η*

_{ j(-k) }in equation (6). The data that allow us to estimate

*γ*

_{ k }is the genotype array for all individuals at locus

*k*. Define

*j*has a genotype

*A*

_{1}

*A*

_{1}, then

*w*

_{j(11)}= 1 and

*w*

_{j(12)}=

*w*

_{j(22)}= 0. The probabilities of individual

*j*taking the three genotypes are derived from the Bayes' theorem,

is the expected Mendelian ratio. In an F_{2} population, the expected Mendelian ratio is $\varphi =\left[\begin{array}{ccc}\hfill \frac{1}{4}\hfill & \hfill \frac{2}{4}\hfill & \hfill \frac{1}{4}\hfill \end{array}\right]$. Note that if *γ*_{
k
} = 0, vector *π*_{
j
} = [*π*_{j(11)}*π*_{j(12)}*π*_{j(22)}] will be equivalent to the expected Mendelian ratio for every individual at the locus.

*X*

_{ j }

*β*should disappear here. This is different from the usual linear regression analysis where an intercept should always appear in the model. With the liability selection model, there is no intercept. We now assume only one co-factor to consider. The

*X*

_{ j }variable can be discrete or continuous, but the distribution in the unselected population must be known. In this study, we first assume that

*X*

_{ j }is discrete, say gender, a variable indicating the gender of individual

*j*with

*X*

_{ j }= 1 representing male and

*X*

_{ j }= -1 representing female. In the unselected population, the sex ratio should be 1:1. If the population evaluated has a biased sex ratio, this means that the gender has an effect on the liability. We can estimate the gender effect

*β*on the liability. Let $\phi =\left[\begin{array}{cc}\hfill {\phi}_{1}\hfill & \hfill {\phi}_{2}\hfill \end{array}\right]=\left[\begin{array}{cc}\hfill \frac{1}{2}\hfill & \hfill \frac{1}{2}\hfill \end{array}\right]$ be the expected sex ratio (prior to the selection). Define

*ξ*

_{j(1)}or

*ξ*

_{j(2)}as the posterior probability that individual

*j*is male or female, respectively. These posteriors are calculated using

is the linear predictor excluding the gender effect.

*X*

_{ j }is a continuous non-genetic effect, e.g., age. Let us assume that

*X*

_{ j }follows a normal distribution in the unselected population, i.e.,

*p*(

*X*

_{ j }) =

*N*(

*X*

_{ j }|

*μ*,

*σ*

^{2}), where

*μ*and

*σ*

^{2}are known. Let

*β*be the effect of

*X*

_{ j }on the liability. Define Φ(

*X*

_{ j }

*β*+

*η*

_{j(-β)}) as the probability that individual

*j*has survived the selection. The posterior probability is defined as

Proof of this equation (16) is straightforward and thus given in the next paragraph.

*f*(

*X*

_{ j }) =

*N*(

*X*

_{ j }|

*μ*,

*σ*

^{2}) be the normal density for variable

*X*

_{ j }with known

*μ*and

*σ*

^{2}. The following Lemma [27] is used to derive equation (16).

*ξ*= -

*η*

_{j(-β)}/

*β*and

*λ*

^{2}= 1/

*β*

^{2}. Substituting these into equation (17), we get

This concludes the derivation of equation (16) presented in the previous paragraph.

### Likelihood, prior and posterior

*k*can be derived based on the multivariate Bernoulli distribution, that is

*L*(

*γ*

_{ k }|

*η*

_{(-k)}) because it is conditioned on the gender effect and effects of other loci. We use the simplified notation to improve the readability. Let us assign a normal prior to

*γ*

_{ k }, i.e.,

_{ k },

*τ*is the prior degree of freedom and

*ω*is the prior scale matrix with the same dimension as ∑

_{ k }. The reason for assigning these prior distributions is to handle a possible large number of loci involved in the model. Uniform prior for the gender effect is assumed. The log posterior (denoted by LogPost) is

where a constant has been ignored.

*β*conditional on

*η*

_{j(-β)}is

*β*can be written as

Prior distribution for the non-genetic effect is assumed to be uniform (uninformative prior) and thus only the likelihood is needed to find the posterior mode estimate of *β*.

### Posterior mode estimation

Due to the possible large number of parameters, we take a sequential approach to estimating the posterior mode parameters with one locus at a time. This approach is also called the coordinate descent algorithm. Once the parameters of all loci are updated, the sequence is repeated until a certain criterion of convergence is reached.

*γ*

_{ k }at iteration

*t*are denoted by $\mathsf{\text{E}}\left({\gamma}_{k}\right)={\gamma}_{k}^{\left(t+1\right)}$ and var(

*γ*

_{ k }) =

*V*

_{ k }, respectively. Since the posterior distribution of

*γ*

_{ k }is approximately multivariate normal (asymptotical theory), the posterior mean is identical to the posterior mode. The posterior of ∑

_{ k }remains scaled inverse Wishart due to the conjugate property of the prior. Therefore, the posterior mode of ∑

_{ k }is

where τ + 1 is the degree of freedom for the inverse Wishart posterior and the number 2 represents the dimension of vector *γ*_{
k
}.

*β*conditional on the effects of all loci is

The iteration process of the posterior mode estimation is summarized as follows.

Step 0: Initialize all parameters.

Step 1: Update the non-genetic effect using equation (29).

Step 2: Update effect of marker *k* for *k* = 1, ⋯, *p* using equation (26).

Step 3: Update ∑_{
k
} for *k* = 1, ⋯, *p* using equation (28).

Step 4: Repeat step 1 to step 3 until the iteration process converges.

### Genetic contribution from an individual locus

*Z*variables in an F

_{2}population are 0.5 for the additive part and 1.0 for the dominance part. The reason is that the three genotypes are coded as +1, 0 and -1 for the additive

*Z*and -1, 1 and -1 for the dominance

*Z*[28]. Let

*a*

_{ k }and

*d*

_{ k }be the additive and dominance effects of this SDL. The genetic variance explained by this locus is

*k*th SDL. Assuming that the multiple SDL are not closely linked, the overall contribution from all SDL is approximated by

The liability model has unified QTL mapping and SDL mapping in the same framework of quantitative genetics.

## Results

### Mouse experiment

We used a published dataset of an F_{2} mouse experiment to demonstrate the application of the method. The dataset was published by Lan et al. [29] and is freely available from the internet. The mouse genome has 19 chromosomes (excluding the sex chromosome). The data contains 110 F_{2} *ob/ob* mice derived from the cross of two inbred lines (BT×BTBR) and 193 markers covering 1,800 cM of the entire mouse genome. The average marker distance was 9.35 cM per marker interval. We inserted one or more pseudo markers in intervals larger than 5 cM to make sure that the entire genome is evenly covered by (pseudo or true) markers with no intervals larger than 5 cM. The number of pseudo markers inserted was 273, resulting in a total of 466 markers (193 true and 273 pseudo markers). For the pseudo markers, the genotype indicator variable, *w*_{
j
} = [*w*_{j(11)}*W*_{j(12)}*w*_{j(22)}], is missing for every individual. In the data analysis, the missing variable was replaced by the conditional probability calculated using the multipoint method [30].

*A*

_{1}

*A*

_{1},

*A*

_{1}

*A*

_{2}and

*A*

_{2}

*A*

_{2}, plotted against the mouse genome. It is obvious that there is a severe distortion in the beginning of chromosome 6 where the population contains almost exclusively the

*A*

_{2}

*A*

_{2}genotypes with

*A*

_{1}

*A*

_{1}and

*A*

_{1}

*A*

_{2}almost eliminated from the population. Chromosomes 14 and 18 also show mild segregation distortion. Interval mapping for segregation distortion using the QTL procedure in SAS [31] showed that the LOD score for chromosome 6 is 43.25 (see the bottom panel of Figure 1 for the LOD score profile obtained from the interval mapping analysis). The interval mapping procedure [31] is a separate analysis for each marker. With the interval mapping, the position with the highest LOD score (43.25) occurred at a pseudo marker (at position 15.69 cM) between the first true marker (D6Mit86, 0 cM) and the second true marker (D6Mit224, 30.4 cM) on chromosome 6. The estimated frequencies of this pseudo marker are 0.0000, 0.0001 and 0.9999 for the three genotypes (

*A*

_{1}

*A*

_{1},

*A*

_{1}

*A*

_{2}and

*A*

_{2}

*A*

_{2}), respectively.

*τ*,

*ω*) = (0,0), equivalent to the Jeffrey's prior for the variance components. The estimated additive and dominance effects along with the corresponding LOD scores are depicted in Figure 2. One segregation distortion locus was detected on chromosome 6 (same as that of the interval mapping). The location of this distortion locus is right at the first marker of chromosome 6 (D6Mit86, 0 cM). The interval mapping approach described in the previous paragraph also detected a segregation distortion locus. However, the SDL detected by interval mapping was located halfway (15.69 cM) between markers D6Mit86 (0 cM) and D6Mit224 (30.4 cM) (see Figure 1 for the result of interval mapping). The GLMM analysis also showed some distortion for the second marker (D6Mit224, 30.4 cM), but the LOD score is only 3, barely significant. Therefore, we can safely ignore this locus due to linkage with the first marker. Let us go back to the first marker D6Mit86, the major SDL detected by the GLMM method. This segregation distortion locus is caused by both the additive and dominance effects. The estimated additive effect (± standard error) is $\widehat{a}=4.6230\pm 0.4248$ while the estimated dominance effect (± standard error) is $\widehat{d}=-1.6656\pm 0.1833$. The LOD scores are 25.69 and 17.92, respectively, for the additive and dominance effects. Simulation experiment under the null hypothesis (Mendelian segregation) showed that the 95% value of the null distribution of the LOD scores is 3.8, much smaller than the actual LOD score of 25.69. Therefore, we are very confident for this detected segregation distortion locus. As expected, the estimated sex effect is $\widehat{\beta}=0.1969\pm 0.3002$ with a LOD score of 0.0934, smaller than 1.0255, the 95% value of the LOD score generated under the null model. Therefore, we can safely claim that the gender effect is insignificant.

respectively, for *A*_{1}*A*_{1}, *A*_{1}*A*_{2} and *A*_{2}*A*_{2}.

### Simulation experiment

_{2}family with 500 individuals are also presented in Figure 3 (top panel). We also simulated two co-factors that influence the liability. The first co-factor was the sex effect coded as 1 for male and -1 for female with an effect value of

*β*

_{1}= 1.0. The second co-factor was a continuous variable with

*μ*= 0 and

*σ*

^{2}= 0.025. The effect of this co-factor on the liability was

*β*

_{2}= 1.0. The liability of each individual was generated using the linear model containing the two cofactors and the six QTL. An individual with a liability greater than 0 survived the selection, otherwise, it was eliminated. All the 500 individuals in the sample survived the selection. The simulated data were analyzed using the generalized linear mixed model with (

*τ*,

*ω*) = (0,0) as the hyper-parameter values.

Estimated parameters of the QTL identified by GLMM compared to true values in the simulation.

True effect | True proportion | Estimate | StdErr | Position (cM) | LOD | Proportion | |
---|---|---|---|---|---|---|---|

QTL 1 | 1.4135 | 0.1543 | 1.1905 | 0.1357 | 50 | 16.6828 | 0.1224 |

QTL 2 | -0.9993 | 0.0771 | -0.8296 | 0.1252 | 125 | 9.5271 | 0.0594 |

QTL 3 | 0.9993 | 0.0771 | 0.9605 | 0.1328 | 360 | 11.3536 | 0.0796 |

QTL 4 | -1.2048 | 0.1121 | -1.1991 | 0.1353 | 905 | 17.0304 | 0.1241 |

QTL 5 | 1.0000 | 0.0772 | 0.8593 | 0.1310 | 1735 | 9.3347 | 0.0637 |

QTL 6 | -1.41354 | 0.1543 | -1.2959 | 0.1380 | 2115 | 19.1230 | 0.1450 |

Co-factor 1 | 1.0000 | 0.1545 | 1.0217 | 0.1020 | -- | 21.7673 | 0.1803 |

Co-factor 2 | 1.0000 | 0.0386 | 1.1007 | 0.1809 | -- | 8.0412 | 0.0523 |

0.8455 | 0.8272 |

Average estimates of effects and powers of simulated QTL and co-factors from 100 replicated simulations.

True | Estimate | StdEv | Power (%) | |
---|---|---|---|---|

QTL 1 | 1.4135 | 1.1028 | 0.1329 | 99 |

QTL 2 | -0.9993 | -0.5964 | 0.1270 | 71 |

QTL 3 | 0.9993 | 0.7663 | 0.1474 | 91 |

QTL 4 | -1.2048 | -0.9858 | 0.1310 | 98 |

QTL 5 | 1.0000 | 0.7166 | 0.1375 | 87 |

QTL 6 | -1.41354 | -1.1977 | 0.1488 | 100 |

Co-factor 1 | 1.0000 | 0.9192 | 0.1299 | 100 |

Co-factor 2 | 1.0000 | 0.8894 | 0.1895 | 95 |

## Discussion and conclusions

Genome-wide segregation distortion is a common phenomenon in genetic mapping, but it is usually ignored. The main reason is the difficulty in joint estimation and tests of the segregation distortion loci. We formulated the problem as a typical quantitative genetics problem using a hypothetical liability to describe the fitness of each individual. Using a generalized linear mixed model, we were able to estimate and test genome-wide quantitative trait loci controlling the hidden liability. We used a mouse dataset to demonstrate the method and detected a major QTL for the liability that explains 93% of the liability variance. The simulated data experiment showed that the method can detect a QTL (e.g., the second QTL simulated) explaining 7.71% of the liability variation with 71% power. The method was implemented in a SAS/IML program. The code is posted on our website (http://www.statgen.ucr.edu) for general application. With this method and the program, genome-wide segregation distortion can be investigated routinely in future genetic data analysis.

As a Bayesian method, there are a rich array of prior distributions can be explored. In this study, we used the inverse Wishart as the prior distribution for the prior variance matrix of QTL effects. For the additive genetic model (one effect per locus), the inverse Wishart distribution becomes a scaled inverse Chi-square distribution. It is possible to use the exponential distribution (the Lasso prior) as an alternative prior [32]. Because the method uses multiple levels of prior choice, the model can also be called hierarchical generalized linear mixed model [24, 33]. This study opens a new area in statistical genetics and further studies are expected to arise. For example, how to use the adaptive Lasso [34] to address this problem is entirely unknown and can be explored in the future.

A caveat of this method is the requirement of Mendelian segregation ratio (before the selection). For populations generated through line crossing experiments, Mendelian ratios are known. However, for uncontrolled populations, the theoretical Mendelian frequencies are not available. In this case, one needs to survey the unselected population to obtain the genotypic frequencies as the controlled "Mendelian segregation". If one can genotype both the selected and unselected individuals, one may simply use the case-control study and there is little reason to use this case-only study approach. In reality, genotyping individuals is much more costly than pooling the DNA of a sample of individuals. The cost effective approach is to genotype each individual in the surviving sample and genotype the pooled DNA sample for the unselected population because we only need the frequencies of genotypes (not the genotypes of individuals) in the unselected population. For the co-factors, we also need the expected frequencies of the co-factors in the unselected population. We examined the sex effect (discrete co-factor) and a normally distributed co-factor. The expected 1:1 sex ratio was used as the expected frequency. For the normal co-factor, we used the mean and variance of the co-factor used in the simulation (the true values) to construct the expected distribution. In reality, one needs to survey the entire population to obtain the expected distribution. For continuous variables deviating from normality, one may discretize a variable to a few groups. For example, age is a quantitative variable but one can arbitrarily divide individuals into a few age groups. This discretization will eliminate the restriction of normal distribution.

The method developed here can be applied to more broad situations beyond genetics without much modification. For example, if we know the joint distribution of *k* variables in a base (unselected) population and the joint distribution of the variables in a selected sample. We can simply test the difference between the two distributions to see which variables influence more on the selection. However, the pair-wise covariance may not allow us to make a precise decision on the importance of each variable. If two variables both influence the selection and they are highly correlated, the influence of one variable may be simply caused by the high correlation with the true causal variable. The proposed method here can help separate the true causality from the influence due to correlation.

QTL mapping is usually conducted in unselected populations. Individuals with undesired phenotypes must also be evaluated to obtain unbiased estimates of QTL effects. This is not a cost effective approach in breeding companies. Breeders wish to use only selected individuals to breed and keep no records for the unselected individuals. If we only evaluate the selected individuals, markers associated with the traits of interest will show distorted segregation. If the selection criterion is not well defined, for example, drought resistance, it is hard to map QTL. The segregation distortion loci are actually the QTL for drought resistance if one knows that there is no segregation distortion in the unselected population. The method developed here can be directly applied to mapping drought resistance QTL. Because we can perform QTL mapping using selected population, this approach may be called "mapping while selecting". For example, breeders may want to evaluate drought resistance of a family of recombinant inbred lines (RIL) by planting all seeds in a harsh drought environment. Eventually all plants die except the ones with strong resistance of drought. Breeders may have no records of the plants eliminated, but they can still perform QTL mapping for this trait (drought resistance) using all plants that have survived the selection. Other stress related traits can also be mapped using this approach, e.g., pest and salinity resistances.

In human genetics, case-control study is a common approach for mapping disease loci. In situations where there are no records for the control but the case, this case-only study may benefit from the new method. For example, one may easily get patient data from hospitals but hardly has individual records for the entire population. QTL mapping for the disease trait is still possible if we have the population records (frequencies) of genotypes in the entire population.

In summary, we developed a hierarchical generalized linear mixed model to map QTL for liability. This is a new approach to genetic mapping. It incorporates a seemingly different problem (segregation distortion) into the same QTL mapping framework for quantitative traits. Statistically, it shows that the generalized linear mixed model can be applied to situations where there are no phenotypic records; one only needs a likelihood function, a linear predictor and a prior distribution to infer the posterior mode estimation of the model effects.

## Declarations

### Acknowledgements

We greatly appreciate two anonymous reviewers and the associated editor for their comments on an early version of the manuscript and their suggestions in revision of the manuscript. The project was supported by the USDA National Institute of Food and Agriculture Grant 2007-02784 to SX.

## Authors’ Affiliations

## References

- Sandler L, Hiraizumi Y, Sandler I: Meiotic Drive in Natural Populations of Drosophila Melanogaster. I. the Cytogenetic Basis of Segregation-Distortion. Genetics. 1959, 44 (2): 233-250.PubMed CentralPubMedGoogle Scholar
- Faris JD, Laddomada B, Gill BS: Molecular mapping of segregation distortion loci in Aegilops tauschii. Genetics. 1998, 149 (1): 319-327.PubMed CentralPubMedGoogle Scholar
- Hackett CA, Broadfoot LB: Effects of genotyping errors, missing values and segregation distortion in molecular marker data on the construction of linkage maps. Heredity. 2003, 90 (1): 33-38. 10.1038/sj.hdy.6800173.View ArticlePubMedGoogle Scholar
- Hartl DL, Hiraizumi Y, Crow JF: Evidence for sperm dysfunction as the mechanism of segregation distortion in Drosophila melanogaster. Proceedings of the National Academy of Sciences of the United States of America. 1967, 58 (6): 2240-2245. 10.1073/pnas.58.6.2240.PubMed CentralView ArticlePubMedGoogle Scholar
- Lu R, Bernardo S: Chromosomal regions associated with segregation distortion in maize. TAG Theoretical and Applied Genetics. 2002, 105 (4): 622-628. 10.1007/s00122-002-0970-9.View ArticlePubMedGoogle Scholar
- Taylor DR, Ingvarsson PK: Common Features of Segregation Distortion in Plants and Animals. Genetica. 2003, 117 (1): 27-35. 10.1023/A:1022308414864.View ArticlePubMedGoogle Scholar
- Xu Y, Zhu L, Xiao J, Huang N, McCouch SR: Chromosomal regions associated with segregation distortion of molecular markers in F
_{2}, backcross, doubled haploid, and recombinant inbred populations in rice (*Oryza sativa*L.). Molecular and General Genetics MGG. 1997, 253 (5): 535-545. 10.1007/s004380050355.View ArticlePubMedGoogle Scholar - Charlesworth B, Charlesworth D: Some evolutionary consequences of deleterious mutations. Genetica. 1998, 102/103: 3-19.View ArticleGoogle Scholar
- Lander ES, Botstein D: Mapping mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics. 1989, 121 (1): 185-199.PubMed CentralPubMedGoogle Scholar
- Xu S: Quantitative trait locus mapping can benefit from segregation distortion. Genetics. 2008, 180 (4): 2201-2208. 10.1534/genetics.108.090688.PubMed CentralView ArticlePubMedGoogle Scholar
- Xu S, Hu Z: Mapping quantitative trait loci using distorted markers. International Journal of Plant Genomics. 2010, 2009: 1-11.Google Scholar
- Fu YB, Ritland K: Evidence for the partial dominance of viability genes contributing to inbreeding depression in Mimulus guttatus. Genetics. 1994, 136 (1): 323-331.PubMed CentralPubMedGoogle Scholar
- Lorieux M, Perrier X, Goffinet B, Lanaud C, León DG: Maximum-likelihood models for mapping genetic markers showing segregation distortion. 2. F<sub>2</sub> populations. TAG Theoretical and Applied Genetics. 1995, 90 (1): 81-89.View ArticlePubMedGoogle Scholar
- Vogl C, Xu S: Multipoint mapping of viability and segregation distorting loci using molecular markers. Genetics. 2000, 155: 1439-1447.PubMed CentralPubMedGoogle Scholar
- Luo L, Xu S: Mapping viability loci using molecular markers. Heredity. 2003, 90: 459-467. 10.1038/sj.hdy.6800264.View ArticlePubMedGoogle Scholar
- Luo L, Zhang Y-M, Xu S: A quantitative genetics model for viability selection. Heredity. 2005, 94: 347-355. 10.1038/sj.hdy.6800615.View ArticlePubMedGoogle Scholar
- Sillanpaa MJ, Arjas E: Bayesian mapping of multiple quantitative trait loci from incomplete inbred line cross data. Genetics. 1998, 148 (3): 1373-1388.PubMed CentralPubMedGoogle Scholar
- Wang H, Zhang Y-M, Li X, Masinde GL, Mohan S, Baylink DJ, Xu S: Bayesian shrinkage estimation of quantitative trait loci parameters. Genetics. 2005, 170: 465-480. 10.1534/genetics.104.039354.PubMed CentralView ArticlePubMedGoogle Scholar
- Xu S: Estimating polygenic effects using markers of the entire genome. Genetics. 2003, 163: 789-801.PubMed CentralPubMedGoogle Scholar
- Xu S: An empirical Bayes method for estimating epistatic effects of quantitative trait loci. Biometrics. 2007, 63: 513-521. 10.1111/j.1541-0420.2006.00711.x.View ArticlePubMedGoogle Scholar
- Gilmour AR, Anderson RD, Rae AL: The analysis of binomial data by a generalized linear mixed model. Biometrika. 1985, 72 (3): 593-599. 10.1093/biomet/72.3.593.View ArticleGoogle Scholar
- Harville DA, Mee RW: A mixed-model procedure for analysing ordered categorical data. Biometrics. 1984, 40: 393-408. 10.2307/2531393.View ArticleGoogle Scholar
- Gelman A, Carlin J, Stern H, Rubin D: Bayesian Data Analysis. 2003, London: Chapman & HallGoogle Scholar
- Gelman A, Jakulin A, Pittau MG, Su Y-S: A weakly informative defualt prior distribution for logistic and other regression models. The Annals of Applied Statistics. 2008, 2 (4): 1360-1383. 10.1214/08-AOAS191.View ArticleGoogle Scholar
- Wolfinger R, O'Connell M: Generalized linear mixed models: A pseudo-likelihood approach. The Journal of Statistical Computation and Simulation. 1993, 48: 233-243. 10.1080/00949659308811554.View ArticleGoogle Scholar
- McGilchrist CA: Estimation in generalized mixed model. Journal of the Royal Statistical Society, Series B. 1994, 56 (1): 61-69.Google Scholar
- Cavalli-Sforza LL, Bodmer WF: The Genetics of Human Population. 1971, San Francisco: W. H. Freeman and CompanyGoogle Scholar
- Yang R, Tian Q, Xu S: Mapping quantitative trait loci for longitudinal traits in line crosses. Genetics. 2006, 173 (4): 2339-2356. 10.1534/genetics.105.054775.PubMed CentralView ArticlePubMedGoogle Scholar
- Lan H, Chen M, Flowers JB, Yandell BS, Stapleton DS, Mata CM, Mui ET, Flowers MT, Schueler KL, Manly KF: Combined expression trait correlations and expression quantitative trait locus mapping. PLoS Genetics. 2006, 2 (1): e6-10.1371/journal.pgen.0020006.PubMed CentralView ArticlePubMedGoogle Scholar
- Jiang C, Zeng ZB: Mapping quantitative trait loci with dominant and missing markers in various crosses from two inbred lines. Genetica. 1997, 101 (1): 47-58. 10.1023/A:1018394410659.View ArticlePubMedGoogle Scholar
- Hu Z, Xu S: PROC QTL - A SAS procedure for mapping quantitative trait loci. International Journal of Plant Genomics. 2009, 2009: 1-3.View ArticleGoogle Scholar
- Tibshirani R: Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B. 1996, 58 (1): 267-288.Google Scholar
- Yi N, Banerjee S: Hierarchical generalized linear models for multiple quantitative trait locus mapping. Genetics. 2009, 181 (3): 1101-1113. 10.1534/genetics.108.099556.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhou H: The adaptive Lasso and its orcle properties. Journal of the American Statistical Association. 2006, 101 (476): 1418-1429. 10.1198/016214506000000735.View ArticleGoogle Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.