An EM algorithm for mapping segregation distortion loci

Zhu, Chengsong; Zhang, Yuan-Ming

doi:10.1186/1471-2156-8-82

Methodology article
Open access
Published: 29 November 2007

An EM algorithm for mapping segregation distortion loci

Chengsong Zhu¹ &
Yuan-Ming Zhang¹

BMC Genetics volume 8, Article number: 82 (2007) Cite this article

6327 Accesses
12 Citations
Metrics details

Abstract

Background

Chromosomal region that causes distorted segregation ratios is referred to as segregation distortion locus (SDL). The distortion is caused either by differential representation of SDL genotypes in gametes before fertilization or by viability differences of SDL genotypes after fertilization but before genotype scoring. In both cases, observable phenotypes are distorted for marker loci in the chromosomal region close to the SDL. Under the quantitative genetics model for viability selection by proposing a continuous liability controlling the viability of individual, a simplex algorithm has been used to search for the solution in SDL mapping. However, they did not consider the effects of SDL on the construction of linkage maps.

Results

We proposed a multipoint maximum-likelihood method to estimate the position and the effects of SDL under the liability model together with both selection coefficients of marker genotypes and recombination fractions. The method was implemented via an expectation and maximization (EM) algorithm. The superiority of the method proposed under the liability model over the previous methods was verified by a series of Monte Carlo simulation experiments, together with a working example derived from the MAPMAKER/QTL software.

Conclusion

Our results suggested that the new method can serve as a powerful alternative to existing methods for SDL mapping. Under the liability model, the new method can simultaneously estimate the position and the effects of SDL as well as the recombinant fractions between adjacent markers, and also be used to probe into the genetic mechanism for the bias of uncorrected map distance and to elucidate the relationship between the viability selection and genetic linkage.

Background

In a segregation population derived from a cross between two inbred lines, some molecular markers often show distorted segregation ratios from Mendelian expectations [1–3]. The distortion is frequently related to gamete gene, sterile gene and chromosome translocation [4]. So the detection of the gene or locus, known as segregation distortion locus (SDL) mapping, is warranted. However, the challenge encountered in SDL mapping is mainly caused by the unavailability of phenotypic data for the underlying trait. In fact, molecular markers linked to the SDL frequently show segregation distortion and the degree of distortion depends on the size and the position of SDL. Therefore, it is possible to detect SDL by means of the distortion.

Mapping SDL is usually studied at the population level by examining the change of gene (or genotypic) frequencies of markers [5]. In the past a single marker was often used to detect the linkage between the marker and SDL [6, 7]. Its shortcomings are very similar to those of single-marker approaches in quantitative trait loci (QTL) mapping [8]. Since the introduction of interval mapping of QTL [9], Hedrick and Muona [10] developed a flanking-marker analysis to estimate the fitness parameters for a viability locus. The model of Hedrick and Muona [10] is actually a complete recessive model. Mitchell-Olds [11] detected one putative viability locus at a time and then scanned the entire genome for every putative position to provide a test statistic profile for the detection of SDL. However, his model only test and estimate the degree of dominance. Luo and Xu [12] extended the maximum-likelihood (ML) method to estimate degree of dominance and selection coefficients using an outbred full-sib family as an example. Wang et al. [13] developed a multipoint ML method to estimate the position and the genotypic frequencies of SDL in an F₂ population. However, the efficacies of the methods mentioned above have been seldom addressed in simulation studies. Recently, Luo et al. [14] developed a quantitative genetics model for viability selection. This approach makes it possible to carry out simulation studies, to partition the selection into additive and dominant effects and to remove the effects of non-genetic cofactors from the analysis [14, 15]. However, this approach raises two issues. Firstly, they assumed that segregation distortion didn't affect the construction of genetic linkage map. In fact, marker segregation distortion is known to affect the estimates for both recombination fractions in pair-wise analysis of markers and the order of the markers on a linkage group [16–18]. As for the genetic parameters, then, Luo et al [14] adopted the Simplex algorithm [19] to search for the solutions at the cost of computational consuming. Under the liability model proposed by Luo et al [14], therefore, in this paper it is necessary to extend the multipoint approach by combining the estimations of the genetic parameters of SDL with the reconstruction of genetic linkage maps. The new method for SDL mapping was implemented via an expectation and maximization (EM) algorithm rather than Simplex procedure. The genetic factors that might affect the estimates of recombination fractions between adjacent markers would be discussed in detail. A series of Monte Carlo simulation experiments together with a working example from the Mapmaker/QTL software were carried out to verify our approach.

Methods

Genetic model

Considering an SDL in an F₂ population derived from a cross between two inbred lines, we assumed three genotypes at this locus, AA, Aa and aa, to have genotypic values $\sqrt{2}$ a - d, d and - $\sqrt{2}$ a - d, respectively, with a and d indicating additive and dominant effects, and an imaginary trait, liability, invisible to the investigators but visible to nature, controlled the viabilities of individuals. It should be noted that the genetic variance in an F₂ population was a² + d² rather than $\frac{1}{2} a^{2} + \frac{1}{4} d^{2}$ as usual. The phenotypic value of the j th individual was described by the following linear model,

z_j= g_j+ e_j (1)

where g_jwas the genotypic value for the j th individual, and ε_ja normally distributed residual variable with mean zero and standard deviation 1.0, which accounted for polygenes that were linked to the markers and for environmental variation [14, 18]. Provided that the liability was subject to natural selection, an individual would survive if z_j≥ 0 and would be eliminated from the population if z_j< 0. Since all the sampled individuals had survived from the viability selection, the liability of each genotype followed a truncated distribution with a cumulative probability, G_j= h (h = 1, 2, 3), with

f_{h} = \Pr (z_{j} \geq 0 | G_{j} = h) = Φ [(2 - h) \sqrt{2} a + {(- 1)}^{h} d]

(2)

where h indexed the genotypes of the SDL, and f_hwas referred to as the relative fitness of the h th genotype [14]. The expected frequencies of the three genotypes were

p_{A A} = \frac{0.25 f_{1}}{0.25 f_{1} + 0.5 f_{2} + 0.25 f_{3}} = \frac{f_{1}}{f_{1} + 2 f_{2} + f_{3}}

similarly

\begin{matrix} p_{A a} = \frac{2 f_{2}}{f_{1} + 2 f_{2} + f_{3}} & p_{a a} = \frac{f_{3}}{f_{1} + 2 f_{2} + f_{3}} \end{matrix}

(3)

Mapping SDL under a liability model

We assumed that there was no crossing-over interference among the markers on the linkage group considered, an SDL caused segregation distortion of some or all markers linked to the SDL, and three genotypes for each marker had different viability coefficients. Let the order of the m markers on a same chromosome be M₁, M₂,...,M_m; x_kbe a dummy variable defined as x_k= 1, 0, -1 for a homozygote of P₁, a heterozygote and a homozygote of P₂ at the k th marker, respectively; z_kbe indicator for phenotype of the k th marker (M_k); r_k(or r_k,k+1) be the recombination fraction between the k th and (k+1)th markers; and s_k,1and s_k,2(0 ≤ s_k,1< +∞ and 0 ≤ s_k,2< +∞ for k = 1, 2,...,m) be the viability coefficients of M_km_kand m_km_krelative to M_kM_kat the k th marker.

Now let an SDL locate between the k th and (k+1)th markers, and φ_jhbe the indicator function, taking the value of 1, if the j th individual belonged to the h th possible genotype in the F₂ population, otherwise taking the value of zero. The parameters were Ω = (p_AA, p_Aa, p_aa, δ) or Ω = (a, d, δ), with δ indicating the SDL location. The distribution of φ_jhwas described as

\begin{matrix} \Pr (φ_{j h} | Ω) = {(p_{A A})}^{φ_{j 1}} {(p_{A a})}^{φ_{j 2}} {(p_{a a})}^{φ_{j 3}} = \frac{f_{1}^{φ_{j_{1}}} {(2 f_{2})}^{φ_{j_{2}}} f_{3}^{φ_{j_{3}}}}{f_{1} + 2 f_{2} + f_{3}} & (j = 1, \dots, n) \end{matrix}

(4)

where n was sample size. The likelihood defined with matrix notation was

L (Ω) = \prod_{j = 1}^{n} {[{H^{'}}_{j} (r_{k, k^{'}}) \prod_{o = k - 1}^{1} {H^{'}}_{j} (r_{o, o + 1}) q_{1}]^{'} [H_{j} (r_{k^{'}, k + 1}) \prod_{o = k + 1}^{m - 1} H_{j} (r_{o, o + 1}) c] \frac{f_{1}^{φ_{j_{1}}} {(2 f_{2})}^{φ_{j_{2}}} f_{3}^{φ_{j_{3}}}}{f_{1} + 2 f_{2} + f_{3}}}

(5)

where ${q^{'}}_{1}$ = [Pr(x₁ = 1), Pr(x₁ = 0), Pr(x₁ = -1)], c' = [1, 1, 1], ' denoted transpose of a matrix or vector, and the transition probability matrix H_j(r_k,k+1) from marker M_kto M_k+1for the j th individual was

[\begin{matrix} \frac{{(1 - r_{k})}^{2}}{{(1 - r_{k})}^{2} + 2 s_{k + 1, 1} r_{k} (1 - r_{k}) + s_{k + 1, 2} r_{k}^{2}} & \frac{2 s_{k + 1, 1} r_{k} (1 - r_{k})}{{(1 - r_{k})}^{2} + 2 s_{k + 1, 1} r_{k} (1 - r_{k}) + s_{k + 1, 2} r_{k}^{2}} & \frac{s_{k + 1, 2} r_{k}^{2}}{{(1 - r_{k})}^{2} + 2 s_{k + 1, 1} r_{k} (1 - r_{k}) + s_{k + 1, 2} r_{k}^{2}} \\ \frac{r_{k} (1 - r_{k})}{(1 + s_{k + 1, 2}) r_{k} (1 - r_{k}) + s_{k + 1, 1} (1 - 2 r_{k} + 2 r_{k}^{2})} & \frac{s_{k + 1, 1} (1 - 2 r_{k} + 2 r_{k}^{2})}{(1 + s_{k + 1, 2}) r_{k} (1 - r_{k}) + s_{k + 1, 1} (1 - 2 r_{k} + 2 r_{k}^{2})} & \frac{s_{k + 1, 2} r_{k} (1 - r_{k})}{(1 + s_{k + 1, 2}) r_{k} (1 - r_{k}) + s_{k + 1, 1} (1 - 2 r_{k} + 2 r_{k}^{2})} \\ \frac{r_{k}^{2}}{r_{k}^{2} + 2 s_{k + 1, 1} r_{k} (1 - r_{k}) + s_{k + 1, 2} {(1 - r_{k})}^{2}} & \frac{2 s_{k + 1, 1} r_{k} (1 - r_{k})}{r_{k}^{2} + 2 s_{k + 1, 1} r_{k} (1 - r_{k}) + s_{k + 1, 2} {(1 - r_{k})}^{2}} & \frac{s_{k + 1, 2} {(1 - r_{k})}^{2}}{r_{k}^{2} + 2 s_{k + 1, 1} r_{k} (1 - r_{k}) + s_{k + 1, 2} {(1 - r_{k})}^{2}} \end{matrix}]

There were several ways to find the ML estimates (MLEs) of model parameters. We here adopted an EM algorithm [20] and treated φ_jhas missing data. We regarded δ as constant for the moment, now the parameter set was θ = (a, d)'. For the EM algorithm, we needed to obtain the expectation of the complete data log-likelihood function,

L = C + \sum_{j = 1}^{n} [p (φ_{j 1} = 1) \ln f_{1} + p (φ_{j 2} = 1) \ln (2 f_{2}) + p (φ_{j 3} = 1) \ln f_{3} - \ln (f_{1} + 2 f_{2} + f_{3})]

(6)

where the constant C didn't depend on the parameters of interest, and but did depend on the viability coefficients and map distance between adjacent markers, which could be determined by Zhu et al [18]. The EM algorithm was described as follows.

E-step

Provided the initial values for the model parameters, i.e., a⁽⁰⁾ = 0.0 and d⁽⁰⁾ = 0.0. The posterior probabilities of φ_jh= 1 were

p (φ_{j h} = 1) = \frac{\Pr (φ_{j h} = 1 | z_{j_{1}}, ..., z_{j_{M}}) p_{j h}^{(0)}}{\sum_{o = 1}^{3} \Pr (φ_{j o} = 1 | z_{j 1}, ..., z_{j M}) p_{j o}^{(0)}}

(7)

where $p_{j h}^{(0)}$ (h = 1, 2, 3) was calculated from equation (3), and Pr(φ_jh= 1|z_{j 1},...,z_jM) (j = 1, ...,n; h = 1, 2, 3) the prior probability of the h th genotype of SDL for the j th individual conditional on marker information ( $Z_{j_{1}}, . . ., Z_{j_{M}}$ ) by means of the multipoint method [21].

M-step

The MLEs of parameters were obtained by the Fisher-scoring algorithm as it was impossible to get their explicit solutions [22]. The θ could be updated by

θ⁽¹⁾ = θ⁽⁰⁾ + I^-1S(θ⁽⁰⁾) (8)

where S(θ⁽⁰⁾) was the score function, and I was the Fisher information matrix (more details were given in Appendix). And θ⁽¹⁾ would replace θ⁽⁰⁾ in all subsequent estimating steps, and the procedure was iterated until the convergence occurred. The converged θ ⁽¹⁾ was the MLEs of θ in this M-step.

The E and M steps were iterated until the convergence occurred.

The MLE for the SDL position could be obtained by examining the likelihood-ratio profile along the chromosome as was commonly done in interval mapping of QTL [9].

Following parameter estimation, we tested an overall null hypothesis that was no effect of SDL at the locus of interest (δ). The null hypothesis was formulated as H₀: a = d = 0.0, which was tested using the likelihood-ratio (LR) test statistic:

LR = -2[lnL(0, 0, δ) - lnL(a, d, δ)]

Under the null hypothesis, the statistic LR approximately followed chi-square distribution with two degrees of freedom.

The critical value for power calculation was determined by computing 1,000 permutations [23], the experiment-wise type I error was set at 5%, and the confidence interval of an SDL location was determined by the bootstrapping method [24].

Simulation model

We simulated one chromosome of 100 cM (or 50 cM) long covered by m evenly spaced codominant markers (m = 6, 11 or 21) and put a single SDL at position 25 cM (another SDL was put at position 75 cM if necessary). The dominance ratio of the SDL was denoted by dr = d/a. Given the broad heritability (h²) and dr, the additive and dominant effects could be obtained using numerical algorithm [25]. Based on the method described in Luo et al.[14], all genotypes of both distorted markers and SDL for each individual in an F₂ population were simulated. All simulations were replicated 100 or 1000 times depending on the purpose of the analyses. Empirical power was calculated by counting the number of runs in which test statistics were greater than the critical values [26].

Results

Effects of various factors on SDL mapping

In this simulated experiments, the effects of sample size, SDL heritability and marker interval length on SDL mapping were studied, respectively. The performance of the proposed method was evaluated by statistical power, average and standard deviation of estimates with 100 replicates. All parameters and results were listed in Table 1. The results showed the general behavior of QTL mapping, i.e., the estimate for each parameter was very close to its corresponding true value, the power and the precision for SDL mapping increased with the increase in sample size and SDL heritability, respectively. However, marker interval length had slight effect on the power under the three levels studied.

Table 1 Results of segregation distortion locus (SDL) mapping under the fitness and liabilty models (100 replications)

Full size table

Mapping multiple SDL

Similar to the interval mapping procedure of Lander and Botstein [9], the single-locus model for SDL mapping was used to search for multiple loci. Eleven markers were evenly placed on a simulated chromosome of length 100 cM. Two SDL each with a 0.5 dominance-ratio and a 0.15 heritability were respectively located at positions 25 cM and 75 cM on the simulated chromosome. One hundred independent simulation runs were performed for a sample size of 200. The results were listed in Table 2. Both loci were identified at almost 100% power. The results from simulation experiments demonstrated that the new method based on single-SDL model may be considered as an approximate approach to search for multiple loci if the SDL are sufficiently separated by markers.

Table 2 Results of two segregation distortion loci (SDL) mapping under the fitness and liability models (100 replicates and 200 individuals)

Full size table

A working example

As a demonstration of the proposed method in this paper, we re-analyzed a sample dataset (the source filename: sample.raw) in the MAPMAKER/QTL software [27]. It consisted of 333 F₂ individuals from a cross between two inbred lines in tomato. Each plant was genotyped for 12 marker loci that were divided into two linkage groups. Single-marker chi-square test showed that 5 and 2 markers on the first and second linkage groups deviated from Mendelian segregation ratios, respectively (data not shown). Given the reconstructed linkage maps using the method of Zhu et al.[18], 1000 simulated datasets without segregation distortion were simulated and used to determine the critical value [23]. The confidence interval of a SDL location was determined by the Bootstrap method [24].

The map distances between consecutive markers were calculated twice with and without considering SDL. The former was corrected map distance obtained from the method of Zhu et al. (2007) [18]; and the latter was uncorrected one using the Mapmaker/EXE 3.0 software [27]. The results were listed in Table 3. The results showed that the corrected map distances differed from the uncorrected ones when there were distorted markers. The genetic reason of these inconsistencies would be discussed in the following section. Using the proposed method here, a total of four SDL were mapped (Table 4, Fig 1). Two SDL were on the first linkage group and the others on the second one. The genetic parameters for the four SDL were listed in Table 4. The results showed that the distortion was stronger for the first linkage group than for the second one (Fig 1). It resulted in a maximum difference between the corrected and uncorrected map distances for the first marker interval on the first linkage group. Moreover, two linked SDL on the second linkage group also gave rise to two big differences (Table 3). As compared to a single SDL, therefore, linked SDL had a larger effect on the estimate of map distance.

Table 3 The uncorrected and corrected map distances in the real data analysis

Full size table

Table 4 Results of segregation distortion loci (SDL) mapping in a real data analysis

Full size table

Effect of genetic model of SDL on the estimation of map distance

In this section, our purpose was to make clear the genetic reason for the inconsistencies between corrected and uncorrected map distances when there were distorted markers. Six evenly spaced codominant markers were simulated on a single-chromosome segment of length 50 cM. Two linked SDL with locations at positions 10 and 20 cM (exactly the 2nd and 3rd marker loci) were simulated on the simulated chromosome. One hundred simulation runs were performed for a sample size of 300. Each of datasets was analyzed twice by the method of Zhu et al. (2007) [18] and the Mapmaker/EXE 3.0 software [27]. The former was corrected map distance and the latter uncorrected one. For an additive-dominant model, all genetic parameters and the results were listed in Table 5. Results showed that uncorrected map distance was underestimated for most cases, overestimated for opposite dominant effects, and unbiased for all negative additive effects. The results from the real dataset analysis above partly confirmed the result that opposite dominant effects of the two linked SDL on the second linkage group (Table 4) gave rise to the overestimation (Table 3). For an epistatic model, all genetic parameters and the results were listed in Table 6. Results showed that uncorrected map distance was underestimated for most situations, overestimated for negative additive-by-additive or negative dominant-by-dominant effects, and unbiased for additive-by-dominant effect. As we expected, corrected genetic distances were unbiased when considering SDL (Table 5 and 6). Hence, corrected linkage maps were recommended to be used for further QTL or SDL analysis unless there was strong evidence to believe that all markers presented typical Mendelian segregation.

Table 5 Effect of genetic modes of two linked SDL on the estimates of map distances under the additive-dominant model

Full size table

Table 6 Effect of genetic modes of two linked SDL on the estimates of map distances under the epistatic genetic model

Full size table

Discussion

For SDL mapping, most researchers concentrate their attention upon detecting and testing either the selection coefficients or the degree of dominance under the fitness model [7, 10, 11]. Luo et al.[14] pioneered in the development of SDL mapping under a liability model. Zhu et al. [18] proposed a new method for the reconstruction of linkage maps with distorted, dominant and missing markers. Under the liability model, we developed a method to simultaneously estimate the position and the effects of SDL as well as the recombination fractions between adjacent markers. This approach remains the merits of Luo et al.[14] but differs from others in several aspects. Firstly, it combines the detection of SDL with the reconstruction of marker linkage map. The position and the effect of SDL can be estimated along with the selection coefficient and the degree of dominance. Then, the proposed method may be used to elucidate the relationship between the viability selection and genetic linkage. Thirdly, the likelihood function is involved in the distribution of genotypes of SDL rather than that of marker genotypes in the previous studies [11, 28]. Finally, we adopted an EM algorithm rather than the Simplex procedure to estimate the genetic parameters. Of course, we should notice one common assumption of the mentioned-above approaches that marker segregation distortion is caused by some genetic or viability reasons. For genetic reason, there are two different mechanisms for segregation distortion, one at the gametic level and the other at the zygotic level. In both cases, observable phenotypes are distorted for marker loci in the chromosomal region close to the SDL. Thus the two mechanisms are included in our proposed method. Although we have no way to distinguish them in SDL mapping, the results from the genotype and allele tests [29] for the marker closest to the SDL can be used to infer the presence of zygotic or gametic viability selection in an F₂ population but not in backcross, double haploid and recombinant inbred line populations. Moreover, it should be noted that genetic linkage between distorted markers has been carefully discussed in Wu et al. (2007) [30].

There are two primary routes by which selection can affect the extent of linkage disequilibrium [31]. The first is a hitchhiking effect, in which an entire haplotype that flanks a favored variant can be rapidly swept to high frequency or even fixation [32]. The second way in which selection can affect linkage disequilibrium is through epistatic selection for combination of alleles at two or more loci on the same chromosome [33]. This selection form leads to the association of the particular alleles at different loci. The major difficulty in linkage disequilibrium-based mapping is to quantify the relationship between recombination fraction and linkage disequilibrium measurement. Our analyses are confined to exclude the factors that influence linkage disequilibrium except linkage and selection. We first combine the viability selection with quantitative genetics model, and then explore the relationship between genetic modes of the viability genes and the estimates of the recombination fraction. The simulation studies indicated that most of the genetic modes of the viability genes at the two linked SDL may result in underestimation of genetic distance. We hope that the tentative attempt will make for elucidating the genetic relationship between viability selection and genetic linkage.

In addition, it will be interesting and challenging to combine the SDL analysis with QTL mapping to see what the effects of distorted markers has on the results of QTL mapping. While doing this, one may take a risk of detecting false QTL not due to their genetic effects on the quantitative traits but due to violation of the Mendelian segregation law. It will be a great breakthrough in quantitative genetics area if we can develop a method to separate the effects of viability loci from the effects of QTL [14]. By reason of the complexity of the combined analysis, the related investigations will be discussed separately elsewhere.

Conclusion

Our results suggested that the proposed method can serve as a powerful alternative to existing methods. Under the liability model, the new method can simultaneously estimate the position and the effects of SDL as well as the recombination fractions between adjacent markers, and also be used to probe into the genetic mechanism for the bias of uncorrected map distance and to elucidate the relationship between the viability selection and genetic linkage.

Appendix: Fisher-scoring algorithms for obtaining MLEs of parameters

The Fisher-scoring algorithm can be used to estimate parameters in the M-step of EM algorithm. Let θ = (a, d)^T. The newly estimated θ can be expressed by the score-function vector S and the Fisher information matrix I,

θ^{(1)} = θ^{(0)} + I_{θ = θ^{(0)}}^{- 1} S_{θ = θ^{(0)}}

where S = ∂lnL/∂θ = (∂lnL/∂a, ∂lnL/∂d)^Tis score function, and

I = - E (\frac{\partial^{2} \ln L}{\partial θ \partial η}) = - (\begin{matrix} E (\frac{\partial^{2} \ln L}{\partial a^{2}}) & E (\frac{\partial^{2} \ln L}{\partial a \partial d}) \\ E (\frac{\partial^{2} \ln L}{\partial d \partial a}) & E (\frac{\partial^{2} \ln L}{\partial d^{2}}) \end{matrix})

is Fisher information matrix.

More specifically, the score function and the Fisher information index of the expected complete data log-likelihood can be derived using

\begin{matrix} \frac{\partial \ln L}{\partial θ} & = \sum_{j = 1}^{n} [w (φ_{j 1} = 1) \frac{\partial \ln (f_{1})}{\partial θ} + w (φ_{j 2} = 1) \frac{\partial \ln (2 f_{2})}{\partial θ} + w (φ_{j 3} = 1) \frac{\partial \ln (f_{3})}{\partial θ} - \frac{\partial \ln (f_{1} + 2 f_{2} + f_{3})}{\partial θ}] \\ = \sum_{j = 1}^{n} [\frac{w (φ_{j 1} = 1)}{f_{1}} \frac{\partial (f_{1})}{\partial θ} + \frac{w (φ_{j 2} = 1)}{2 f_{2}} \frac{\partial (2 f_{2})}{\partial θ} + \frac{w (φ_{j 3} = 1)}{f_{3}} \frac{\partial (f_{3})}{\partial θ} - \frac{\partial (f_{1} + 2 f_{2} + f_{3}) / \partial θ}{f_{1} + 2 f_{2} + f_{3}}] \\ = \sum_{j = 1}^{n} [\sum_{h = 1}^{3} \frac{w (φ_{j h} = 1)}{f_{j}} \frac{\partial f_{j}}{\partial θ} - \frac{\partial (f_{1} + 2 f_{2} + f_{3}) / \partial θ}{f_{1} + 2 f_{2} + f_{3}}] \end{matrix}

Let μ_h= $\sqrt{2}$ (2 - h)a + (-1)^hd for h = 1, 2, 3

\begin{matrix} \frac{\partial f_{h}}{\partial θ} & = - \frac{1}{\sqrt{2 π}} \int_{- \infty}^{0} \frac{\partial e^{- \frac{{(z_{j h} - μ_{h})}^{2}}{2}}}{\partial θ} d z_{j h} \\ = - \frac{1}{\sqrt{2 π}} {\int_{- \infty}^{0} e^{- \frac{{(z_{j h} - μ_{h})}^{2}}{2}} d [- \frac{{(z_{j h} - μ_{h})}^{2}}{2}]} \frac{\partial μ_{h}}{\partial θ} \\ = \frac{1}{\sqrt{2 π}} \exp [- \frac{{(0 - μ_{h})}^{2}}{2}] \frac{\partial μ_{h}}{\partial θ} \\ = \frac{1}{\sqrt{2 π}} \exp [- \frac{μ_{h}^{2}}{2}] \frac{\partial μ_{h}}{\partial θ} \end{matrix}

with $\frac{\partial μ_{h}}{\partial θ} = \sqrt{2} (2 - h)$ or (-1)^hwhen θ = a or d correspondingly. Hence

\frac{\partial \ln L}{\partial a} = [\sum_{j = 1}^{n} \sum_{h = 1}^{3} \frac{w (φ_{j h} = 1)}{f_{h}} - \frac{n}{f_{1} + 2 f_{2} + f_{3}}] \sum_{h = 1}^{3} \frac{(2 - h)}{\sqrt{π}} e^{- \frac{{(- \sqrt{2} (2 - h) a - {(- 1)}^{h} d)}^{2}}{2}}

\frac{\partial \ln L}{\partial d} = \sum_{j = 1}^{n} \sum_{h = 1}^{3} \frac{w (φ_{j h} = 1) {(- 1)}^{h} e^{\frac{{(- \sqrt{2} (2 - h) a - {(- 1)}^{h} d)}^{2}}{2}}}{\sqrt{2 π} f_{h}} - \frac{n {(- 1)}^{h} [e^{\frac{{(- \sqrt{2} a + d)}^{2}}{2}} + 2 e^{\frac{{(- d)}^{2}}{2}} + e^{\frac{{(\sqrt{2} a + d)}^{2}}{2}}]}{\sqrt{2 π} (f_{1} + 2 f_{2} + f_{3})}

The second partial derivatives are more messy but a general form was found as

\begin{array}{l} \frac{\partial^{2} \ln L}{\partial θ \partial η} & = \sum_{j = 1}^{n} {\sum_{h = 1}^{3} w (φ_{j h} = 1) [\frac{1}{f_{h}} \frac{\partial^{2} f_{h}}{\partial θ \partial η} - \frac{1}{f_{h}^{2}} \frac{\partial f_{h}}{\partial θ} \frac{\partial f_{h}}{\partial η}] - \frac{1}{f_{1} + 2 f_{2} + f_{3}} \frac{\partial^{2} {(f_{1} + 2 f_{2} + f_{3})}^{2}}{\partial θ \partial η} \\ + \frac{1}{{(f_{1} + 2 f_{2} + f_{3})}^{2}} \frac{\partial (f_{1} + 2 f_{2} + f_{3})}{\partial θ} \frac{\partial (f_{1} + 2 f_{2} + f_{3})}{\partial η}} \end{array}

where

\begin{matrix} \frac{\partial^{2} f_{h}}{\partial θ \partial η} & = \frac{1}{\sqrt{2 π}} \exp (- \frac{μ_{h}^{2}}{2}) (- μ_{h}) \frac{\partial μ_{h}}{\partial θ} \frac{\partial μ_{h}}{\partial η} + \frac{1}{\sqrt{2 π}} \exp (- \frac{μ_{h}^{2}}{2}) \frac{\partial^{2} μ_{h}}{\partial θ \partial η} \\ = \frac{1}{\sqrt{2 π}} \exp (- \frac{μ_{h}^{2}}{2}) (- μ_{h}) \frac{\partial μ_{h}}{\partial θ} \frac{\partial μ_{h}}{\partial η} \end{matrix}

References

Lyttle TW: Segregation distortion. Annual Review of Genetics. 1991, 25: 511-557. 10.1146/annurev.ge.25.120191.002455.
Article CAS PubMed Google Scholar
Carr DE, Dudash MR: Recent approaches into the genetic basis of inbreeding depression in plants. Philos Trans R Soc London B. 2003, 358: 1071-1084. 10.1098/rstb.2003.1295.
Article CAS Google Scholar
Falconer DS, Mackay TFC: Introduction to Quantitative Genetics. 1996, London: Longman, Fourth
Google Scholar
Harushima Y, Nakagahra M, Yano M, Sasaki N: Diverse variation of reproductive barriers in three intraspecific rice crosses. Genetics. 2002, 160: 313-322.
PubMed Central PubMed Google Scholar
Hartl DL, Clark AG: Principles of population genetics. 1997, Sunderland (MA): Sinauer Associates, 3
Google Scholar
Xu Y, Zhu L, Xiao J, Huang N, McCouch SR: Chromosomal regions associated with segregation distortion of molecular markers in F₂, backcross, doubled haploid, and recombinant inbred populations in rice (Oryza sativa L.). Molecular General Genetics. 1997, 253: 535-545. 10.1007/s004380050355.
Article CAS PubMed Google Scholar
Fu YB, Ritland K: Evidence for the partial dominance of viability genes contributing to inbreeding depression in Mimulus guttatus. Genetics. 1994, 136: 323-331.
PubMed Central CAS PubMed Google Scholar
Ritland K: Inferring the genetic basis of inbreeding depression in plants. Genome. 1996, 39: 1-8.
Article CAS PubMed Google Scholar
Lander E, Botstein D: Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics. 1989, 121: 185-199.
PubMed Central CAS PubMed Google Scholar
Hedrick PW, Muona O: Linkage of viability genes to marker loci in selfing organisms. Heredity. 1990, 64: 67-72.
Article Google Scholar
Mitchell-Olds T: Interval mapping of viability loci causing heterosis in Arabidopsis. Genetics. 1995, 140 (3): 1105-1109.
PubMed Central CAS PubMed Google Scholar
Luo L, Xu SZ: Mapping viability loci using molecular markers. Heredity. 2003, 90: 459-467. 10.1038/sj.hdy.6800264.
Article CAS PubMed Google Scholar
Wang CM, Zhu CS, Zhai HQ, Wan JM: Mapping segregation distortion loci (SDL) and quantitative trait loci (QTL) for spikelet sterility in rice (Oryza sativa L.). Genet Res. 2005, 86: 97-106. 10.1017/S0016672305007779.
Article CAS PubMed Google Scholar
Luo L, Zhang YM, Xu SZ: A quantitative genetics model for viability selection. Heredity. 2005, 94: 347-355. 10.1038/sj.hdy.6800615.
Article CAS PubMed Google Scholar
Nichols RA: Quantitative genetics focus issue. Heredity. 2005, 94: 273-274. 10.1038/sj.hdy.6800646.
Article Google Scholar
Lorieux MB, Perrier GX, Gonzalez de Leon , Lanaud C: Maximum likelihood models for mapping genetic markers showing segregation distortion. 1. Backcross population. Theor Appl Genet. 1995, 90: 73-80.
Article CAS PubMed Google Scholar
Lorieux M, Perrier X, Goffinet B, Lanaud C, Gonzalez de Leon D: Maximum likelihood models for mapping genetic markers showing segregation distortion. 2. F₂ population. Theor Appl Genet. 1995, 90: 81-89.
Article CAS PubMed Google Scholar
Zhu C, Wang C, Zhang YM: Modeling segregation distortion for viability selection I. Reconstruction of linkage maps with distorted markers. Theor Appl Genet. 2007, 114: 295-305. 10.1007/s00122-006-0432-x.
Article PubMed Google Scholar
Nelder JA, Mead R: A simplex method for function minimization. The Computational Journal. 1965, 7: 308-313.
Article Google Scholar
Dempster AP, Laird NM, Rubin DB: Maximum likelihood from incomplete data via EM algorithm. J Royal Stat Soc B. 1977, 39: 1-38.
Google Scholar
Rao SQ, Xu SZ: Mapping quantitative trait loci for ordered categorical traits in four-way crosses. Heredity. 1998, 81: 214-224. 10.1038/sj.hdy.6883780.
Article PubMed Google Scholar
Bailey NTJ: Introduction to the mathematical theory of genetic linkage. 1961, Great Britain: Oxford University Press
Google Scholar
Churchill GA, Doerge RK: Empirical threshold values for quantitative trait mapping. Genetics. 1994, 138: 963-971.
PubMed Central CAS PubMed Google Scholar
Visscher PM, Thompson P, Haley CS: Confidence intervals in QTL mapping by bootstrapping. Genetics. 1996, 143: 1013-1020.
PubMed Central CAS PubMed Google Scholar
Press WH, Flanner BP, Teukolsky SA, Vellerting WT: Numerical Recipes in C++: The Art of Scientific Computing. 2nd version. 2001, Cambridge University Press, New York
Google Scholar
Carbonell EA, Gerig TME, Balansard E, Asins MJ: Interval mapping in the analysis of non-additive quantitative trait loci. Biometrics. 1992, 48: 305-315. 10.2307/2532757.
Article Google Scholar
Lander E, Green P, Abrahamson J, Barlow A, Daly MJ, Lincoln SE, Newburg L: MAPMAKER: An interactive computer package for construction primary genetic linkage maps of experimental and natural populations. Genomics. 1987, 1: 174-181. 10.1016/0888-7543(87)90010-3.
Article CAS PubMed Google Scholar
Huang H, Richardson TE, Carson SD, Bongarten BC: Genetic analysis of inbreeding depression in plus tree 850.55 of Pinus radiate D. Don. II Genetics of viability genes. Theor Appl Genet. 1999, 99: 140-146. 10.1007/s001220051218.
Article Google Scholar
Pham JL, Glaszmann JC, Sano R, Barbier P, Ghesquiere A, Second G: Isozyme markers in rice: genetic analysis and linkage relationships. Genome. 1990, 33: 348-359.
Article CAS Google Scholar
Wu R, Ma C, Casella G: Statistical Genetics of Quantitative Traits: Linkage, Maps, and QTL. 2007, Springer, New York, 123-134.
Google Scholar
Ardlie KG, Kruglyak L, Seielstad M: Patterns of linkage disequilibrium in the human genome. Nat Rev Genet. 2002, 3: 299-309. 10.1038/nrg777.
Article CAS PubMed Google Scholar
Lewontin RC: The interaction of selection and linkage. I. General considerations: heterotic models. Genetics. 1964, 49: 49-67.
PubMed Central CAS PubMed Google Scholar
Cannon GB: The effects of heterozygosity and recombination on the relative fitness of experimental populations of Drosophila melanogaster. Genetics. 1963, 48: 919-942.
PubMed Central CAS PubMed Google Scholar

Download references

Acknowledgements

We are grateful to the Associate Editor and two anonymous reviewers for their constructive comments and suggestions that significantly improved the presentation of the manuscript. The research was supported in part by 973 program (2006CB101708), the National Natural Science Foundation of China (No.30470998; No.30671333), NCET (NCET-05-0489), Specialized Research Fund for the Doctoral Program of Higher Education (20060307008), the Talent Foundation of Nanjing Agricultural University to YMZ; China (No.2005038246) and Jiangsu province (No.0502012C) Postdoctoral Science Foundation to CSZ; and the Program for Changjiang Scholars and Innovative Research Team in University, the Ministry of Education (IRT0432).

Author information

Authors and Affiliations

Section on Statistical Genomics, State Key Laboratory of Crop Genetics and Germplasm Enhancement/National Center for Soybean Improvement, Nanjing Agricultural University, Nanjing, 210095, China
Chengsong Zhu & Yuan-Ming Zhang

Authors

Chengsong Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Yuan-Ming Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuan-Ming Zhang.

Additional information

Authors' contributions

CZ designed and carried out the simulation study, and drafted the manuscript. YMZ conceived of the study, participated in the design, coordinated it and revised the manuscript. All authors read and approved the final manuscript.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Zhu, C., Zhang, YM. An EM algorithm for mapping segregation distortion loci. BMC Genet 8, 82 (2007). https://doi.org/10.1186/1471-2156-8-82

Download citation

Received: 12 May 2007
Accepted: 29 November 2007
Published: 29 November 2007
DOI: https://doi.org/10.1186/1471-2156-8-82

An EM algorithm for mapping segregation distortion loci

Abstract

Background

Results

Conclusion

Background

Methods

Genetic model

Mapping SDL under a liability model

E-step

M-step

Simulation model

Results

Effects of various factors on SDL mapping

Mapping multiple SDL

A working example

Effect of genetic model of SDL on the estimation of map distance

Discussion

Conclusion

Appendix: Fisher-scoring algorithms for obtaining MLEs of parameters

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Authors' contributions

Authors’ original submitted files for images

Authors’ original file for figure 1

Rights and permissions

About this article

Cite this article

Keywords

BMC Genomic Data

Contact us

An EM algorithm for mapping segregation distortion loci

Abstract

Background

Results

Conclusion

Background

Methods

Genetic model

Mapping SDL under a liability model

E-step

M-step

Simulation model

Results

Effects of various factors on SDL mapping

Mapping multiple SDL

A working example

Effect of genetic model of SDL on the estimation of map distance

Discussion

Conclusion

Appendix: Fisher-scoring algorithms for obtaining MLEs of parameters

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Authors' contributions

Authors’ original submitted files for images

Authors’ original file for figure 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Genomic Data

Contact us