An EM algorithm for mapping segregation distortion loci
© Zhu and Zhang; licensee BioMed Central Ltd. 2007
Received: 12 May 2007
Accepted: 29 November 2007
Published: 29 November 2007
Chromosomal region that causes distorted segregation ratios is referred to as segregation distortion locus (SDL). The distortion is caused either by differential representation of SDL genotypes in gametes before fertilization or by viability differences of SDL genotypes after fertilization but before genotype scoring. In both cases, observable phenotypes are distorted for marker loci in the chromosomal region close to the SDL. Under the quantitative genetics model for viability selection by proposing a continuous liability controlling the viability of individual, a simplex algorithm has been used to search for the solution in SDL mapping. However, they did not consider the effects of SDL on the construction of linkage maps.
We proposed a multipoint maximum-likelihood method to estimate the position and the effects of SDL under the liability model together with both selection coefficients of marker genotypes and recombination fractions. The method was implemented via an expectation and maximization (EM) algorithm. The superiority of the method proposed under the liability model over the previous methods was verified by a series of Monte Carlo simulation experiments, together with a working example derived from the MAPMAKER/QTL software.
Our results suggested that the new method can serve as a powerful alternative to existing methods for SDL mapping. Under the liability model, the new method can simultaneously estimate the position and the effects of SDL as well as the recombinant fractions between adjacent markers, and also be used to probe into the genetic mechanism for the bias of uncorrected map distance and to elucidate the relationship between the viability selection and genetic linkage.
In a segregation population derived from a cross between two inbred lines, some molecular markers often show distorted segregation ratios from Mendelian expectations [1–3]. The distortion is frequently related to gamete gene, sterile gene and chromosome translocation . So the detection of the gene or locus, known as segregation distortion locus (SDL) mapping, is warranted. However, the challenge encountered in SDL mapping is mainly caused by the unavailability of phenotypic data for the underlying trait. In fact, molecular markers linked to the SDL frequently show segregation distortion and the degree of distortion depends on the size and the position of SDL. Therefore, it is possible to detect SDL by means of the distortion.
Mapping SDL is usually studied at the population level by examining the change of gene (or genotypic) frequencies of markers . In the past a single marker was often used to detect the linkage between the marker and SDL [6, 7]. Its shortcomings are very similar to those of single-marker approaches in quantitative trait loci (QTL) mapping . Since the introduction of interval mapping of QTL , Hedrick and Muona  developed a flanking-marker analysis to estimate the fitness parameters for a viability locus. The model of Hedrick and Muona  is actually a complete recessive model. Mitchell-Olds  detected one putative viability locus at a time and then scanned the entire genome for every putative position to provide a test statistic profile for the detection of SDL. However, his model only test and estimate the degree of dominance. Luo and Xu  extended the maximum-likelihood (ML) method to estimate degree of dominance and selection coefficients using an outbred full-sib family as an example. Wang et al.  developed a multipoint ML method to estimate the position and the genotypic frequencies of SDL in an F2 population. However, the efficacies of the methods mentioned above have been seldom addressed in simulation studies. Recently, Luo et al.  developed a quantitative genetics model for viability selection. This approach makes it possible to carry out simulation studies, to partition the selection into additive and dominant effects and to remove the effects of non-genetic cofactors from the analysis [14, 15]. However, this approach raises two issues. Firstly, they assumed that segregation distortion didn't affect the construction of genetic linkage map. In fact, marker segregation distortion is known to affect the estimates for both recombination fractions in pair-wise analysis of markers and the order of the markers on a linkage group [16–18]. As for the genetic parameters, then, Luo et al  adopted the Simplex algorithm  to search for the solutions at the cost of computational consuming. Under the liability model proposed by Luo et al , therefore, in this paper it is necessary to extend the multipoint approach by combining the estimations of the genetic parameters of SDL with the reconstruction of genetic linkage maps. The new method for SDL mapping was implemented via an expectation and maximization (EM) algorithm rather than Simplex procedure. The genetic factors that might affect the estimates of recombination fractions between adjacent markers would be discussed in detail. A series of Monte Carlo simulation experiments together with a working example from the Mapmaker/QTL software were carried out to verify our approach.
Considering an SDL in an F2 population derived from a cross between two inbred lines, we assumed three genotypes at this locus, AA, Aa and aa, to have genotypic values a - d, d and -a - d, respectively, with a and d indicating additive and dominant effects, and an imaginary trait, liability, invisible to the investigators but visible to nature, controlled the viabilities of individuals. It should be noted that the genetic variance in an F2 population was a2 + d2 rather than as usual. The phenotypic value of the j th individual was described by the following linear model,
z j = g j + e j (1)
Mapping SDL under a liability model
We assumed that there was no crossing-over interference among the markers on the linkage group considered, an SDL caused segregation distortion of some or all markers linked to the SDL, and three genotypes for each marker had different viability coefficients. Let the order of the m markers on a same chromosome be M1, M2,...,M m ; x k be a dummy variable defined as x k = 1, 0, -1 for a homozygote of P1, a heterozygote and a homozygote of P2 at the k th marker, respectively; z k be indicator for phenotype of the k th marker (M k ); r k (or rk,k+1) be the recombination fraction between the k th and (k+1)th markers; and sk,1and sk,2(0 ≤ sk,1< +∞ and 0 ≤ sk,2< +∞ for k = 1, 2,...,m) be the viability coefficients of M k m k and m k m k relative to M k M k at the k th marker.
where the constant C didn't depend on the parameters of interest, and but did depend on the viability coefficients and map distance between adjacent markers, which could be determined by Zhu et al . The EM algorithm was described as follows.
where (h = 1, 2, 3) was calculated from equation (3), and Pr(φ jh = 1|zj 1,...,z jM ) (j = 1, ...,n; h = 1, 2, 3) the prior probability of the h th genotype of SDL for the j th individual conditional on marker information () by means of the multipoint method .
The MLEs of parameters were obtained by the Fisher-scoring algorithm as it was impossible to get their explicit solutions . The θ could be updated by
θ(1) = θ(0) + I-1S(θ(0)) (8)
where S(θ(0)) was the score function, and I was the Fisher information matrix (more details were given in Appendix). And θ(1) would replace θ(0) in all subsequent estimating steps, and the procedure was iterated until the convergence occurred. The converged θ (1) was the MLEs of θ in this M-step.
The E and M steps were iterated until the convergence occurred.
The MLE for the SDL position could be obtained by examining the likelihood-ratio profile along the chromosome as was commonly done in interval mapping of QTL .
Following parameter estimation, we tested an overall null hypothesis that was no effect of SDL at the locus of interest (δ). The null hypothesis was formulated as H0: a = d = 0.0, which was tested using the likelihood-ratio (LR) test statistic:
LR = -2[lnL(0, 0, δ) - lnL(a, d, δ)]
Under the null hypothesis, the statistic LR approximately followed chi-square distribution with two degrees of freedom.
The critical value for power calculation was determined by computing 1,000 permutations , the experiment-wise type I error was set at 5%, and the confidence interval of an SDL location was determined by the bootstrapping method .
We simulated one chromosome of 100 cM (or 50 cM) long covered by m evenly spaced codominant markers (m = 6, 11 or 21) and put a single SDL at position 25 cM (another SDL was put at position 75 cM if necessary). The dominance ratio of the SDL was denoted by dr = d/a. Given the broad heritability (h2) and dr, the additive and dominant effects could be obtained using numerical algorithm . Based on the method described in Luo et al., all genotypes of both distorted markers and SDL for each individual in an F2 population were simulated. All simulations were replicated 100 or 1000 times depending on the purpose of the analyses. Empirical power was calculated by counting the number of runs in which test statistics were greater than the critical values .
Effects of various factors on SDL mapping
Results of segregation distortion locus (SDL) mapping under the fitness and liabilty models (100 replications)
Interval length (cM)
Frequencies of genotypes
Mapping multiple SDL
Results of two segregation distortion loci (SDL) mapping under the fitness and liability models (100 replicates and 200 individuals)
A working example
As a demonstration of the proposed method in this paper, we re-analyzed a sample dataset (the source filename: sample.raw) in the MAPMAKER/QTL software . It consisted of 333 F2 individuals from a cross between two inbred lines in tomato. Each plant was genotyped for 12 marker loci that were divided into two linkage groups. Single-marker chi-square test showed that 5 and 2 markers on the first and second linkage groups deviated from Mendelian segregation ratios, respectively (data not shown). Given the reconstructed linkage maps using the method of Zhu et al., 1000 simulated datasets without segregation distortion were simulated and used to determine the critical value . The confidence interval of a SDL location was determined by the Bootstrap method .
The uncorrected and corrected map distances in the real data analysis
Linkage group 1
Linkage group 2
Results of segregation distortion loci (SDL) mapping in a real data analysis
Confidence interval (95%)
Nearest marker to SDL
Effect of genetic model of SDL on the estimation of map distance
Effect of genetic modes of two linked SDL on the estimates of map distances under the additive-dominant model
Estimates of map distances (cM)
The 1st interval
The 2nd interval
The 3rd interval
The 4th interval
The 5th interval
Effect of genetic modes of two linked SDL on the estimates of map distances under the epistatic genetic model
Estimates of map distances (cM)
The 1st interval
The 2nd interval
The 3rd interval
The 4th interval
The 5th interval
For SDL mapping, most researchers concentrate their attention upon detecting and testing either the selection coefficients or the degree of dominance under the fitness model [7, 10, 11]. Luo et al. pioneered in the development of SDL mapping under a liability model. Zhu et al.  proposed a new method for the reconstruction of linkage maps with distorted, dominant and missing markers. Under the liability model, we developed a method to simultaneously estimate the position and the effects of SDL as well as the recombination fractions between adjacent markers. This approach remains the merits of Luo et al. but differs from others in several aspects. Firstly, it combines the detection of SDL with the reconstruction of marker linkage map. The position and the effect of SDL can be estimated along with the selection coefficient and the degree of dominance. Then, the proposed method may be used to elucidate the relationship between the viability selection and genetic linkage. Thirdly, the likelihood function is involved in the distribution of genotypes of SDL rather than that of marker genotypes in the previous studies [11, 28]. Finally, we adopted an EM algorithm rather than the Simplex procedure to estimate the genetic parameters. Of course, we should notice one common assumption of the mentioned-above approaches that marker segregation distortion is caused by some genetic or viability reasons. For genetic reason, there are two different mechanisms for segregation distortion, one at the gametic level and the other at the zygotic level. In both cases, observable phenotypes are distorted for marker loci in the chromosomal region close to the SDL. Thus the two mechanisms are included in our proposed method. Although we have no way to distinguish them in SDL mapping, the results from the genotype and allele tests  for the marker closest to the SDL can be used to infer the presence of zygotic or gametic viability selection in an F2 population but not in backcross, double haploid and recombinant inbred line populations. Moreover, it should be noted that genetic linkage between distorted markers has been carefully discussed in Wu et al. (2007) .
There are two primary routes by which selection can affect the extent of linkage disequilibrium . The first is a hitchhiking effect, in which an entire haplotype that flanks a favored variant can be rapidly swept to high frequency or even fixation . The second way in which selection can affect linkage disequilibrium is through epistatic selection for combination of alleles at two or more loci on the same chromosome . This selection form leads to the association of the particular alleles at different loci. The major difficulty in linkage disequilibrium-based mapping is to quantify the relationship between recombination fraction and linkage disequilibrium measurement. Our analyses are confined to exclude the factors that influence linkage disequilibrium except linkage and selection. We first combine the viability selection with quantitative genetics model, and then explore the relationship between genetic modes of the viability genes and the estimates of the recombination fraction. The simulation studies indicated that most of the genetic modes of the viability genes at the two linked SDL may result in underestimation of genetic distance. We hope that the tentative attempt will make for elucidating the genetic relationship between viability selection and genetic linkage.
In addition, it will be interesting and challenging to combine the SDL analysis with QTL mapping to see what the effects of distorted markers has on the results of QTL mapping. While doing this, one may take a risk of detecting false QTL not due to their genetic effects on the quantitative traits but due to violation of the Mendelian segregation law. It will be a great breakthrough in quantitative genetics area if we can develop a method to separate the effects of viability loci from the effects of QTL . By reason of the complexity of the combined analysis, the related investigations will be discussed separately elsewhere.
Our results suggested that the proposed method can serve as a powerful alternative to existing methods. Under the liability model, the new method can simultaneously estimate the position and the effects of SDL as well as the recombination fractions between adjacent markers, and also be used to probe into the genetic mechanism for the bias of uncorrected map distance and to elucidate the relationship between the viability selection and genetic linkage.
Appendix: Fisher-scoring algorithms for obtaining MLEs of parameters
is Fisher information matrix.
We are grateful to the Associate Editor and two anonymous reviewers for their constructive comments and suggestions that significantly improved the presentation of the manuscript. The research was supported in part by 973 program (2006CB101708), the National Natural Science Foundation of China (No.30470998; No.30671333), NCET (NCET-05-0489), Specialized Research Fund for the Doctoral Program of Higher Education (20060307008), the Talent Foundation of Nanjing Agricultural University to YMZ; China (No.2005038246) and Jiangsu province (No.0502012C) Postdoctoral Science Foundation to CSZ; and the Program for Changjiang Scholars and Innovative Research Team in University, the Ministry of Education (IRT0432).
- Lyttle TW: Segregation distortion. Annual Review of Genetics. 1991, 25: 511-557. 10.1146/annurev.ge.25.120191.002455.View ArticlePubMedGoogle Scholar
- Carr DE, Dudash MR: Recent approaches into the genetic basis of inbreeding depression in plants. Philos Trans R Soc London B. 2003, 358: 1071-1084. 10.1098/rstb.2003.1295.View ArticleGoogle Scholar
- Falconer DS, Mackay TFC: Introduction to Quantitative Genetics. 1996, London: Longman, FourthGoogle Scholar
- Harushima Y, Nakagahra M, Yano M, Sasaki N: Diverse variation of reproductive barriers in three intraspecific rice crosses. Genetics. 2002, 160: 313-322.PubMed CentralPubMedGoogle Scholar
- Hartl DL, Clark AG: Principles of population genetics. 1997, Sunderland (MA): Sinauer Associates, 3Google Scholar
- Xu Y, Zhu L, Xiao J, Huang N, McCouch SR: Chromosomal regions associated with segregation distortion of molecular markers in F2, backcross, doubled haploid, and recombinant inbred populations in rice (Oryza sativa L.). Molecular General Genetics. 1997, 253: 535-545. 10.1007/s004380050355.View ArticlePubMedGoogle Scholar
- Fu YB, Ritland K: Evidence for the partial dominance of viability genes contributing to inbreeding depression in Mimulus guttatus. Genetics. 1994, 136: 323-331.PubMed CentralPubMedGoogle Scholar
- Ritland K: Inferring the genetic basis of inbreeding depression in plants. Genome. 1996, 39: 1-8.View ArticlePubMedGoogle Scholar
- Lander E, Botstein D: Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics. 1989, 121: 185-199.PubMed CentralPubMedGoogle Scholar
- Hedrick PW, Muona O: Linkage of viability genes to marker loci in selfing organisms. Heredity. 1990, 64: 67-72.View ArticleGoogle Scholar
- Mitchell-Olds T: Interval mapping of viability loci causing heterosis in Arabidopsis. Genetics. 1995, 140 (3): 1105-1109.PubMed CentralPubMedGoogle Scholar
- Luo L, Xu SZ: Mapping viability loci using molecular markers. Heredity. 2003, 90: 459-467. 10.1038/sj.hdy.6800264.View ArticlePubMedGoogle Scholar
- Wang CM, Zhu CS, Zhai HQ, Wan JM: Mapping segregation distortion loci (SDL) and quantitative trait loci (QTL) for spikelet sterility in rice (Oryza sativa L.). Genet Res. 2005, 86: 97-106. 10.1017/S0016672305007779.View ArticlePubMedGoogle Scholar
- Luo L, Zhang YM, Xu SZ: A quantitative genetics model for viability selection. Heredity. 2005, 94: 347-355. 10.1038/sj.hdy.6800615.View ArticlePubMedGoogle Scholar
- Nichols RA: Quantitative genetics focus issue. Heredity. 2005, 94: 273-274. 10.1038/sj.hdy.6800646.View ArticleGoogle Scholar
- Lorieux MB, Perrier GX, Gonzalez de Leon , Lanaud C: Maximum likelihood models for mapping genetic markers showing segregation distortion. 1. Backcross population. Theor Appl Genet. 1995, 90: 73-80.View ArticlePubMedGoogle Scholar
- Lorieux M, Perrier X, Goffinet B, Lanaud C, Gonzalez de Leon D: Maximum likelihood models for mapping genetic markers showing segregation distortion. 2. F2 population. Theor Appl Genet. 1995, 90: 81-89.View ArticlePubMedGoogle Scholar
- Zhu C, Wang C, Zhang YM: Modeling segregation distortion for viability selection I. Reconstruction of linkage maps with distorted markers. Theor Appl Genet. 2007, 114: 295-305. 10.1007/s00122-006-0432-x.View ArticlePubMedGoogle Scholar
- Nelder JA, Mead R: A simplex method for function minimization. The Computational Journal. 1965, 7: 308-313.View ArticleGoogle Scholar
- Dempster AP, Laird NM, Rubin DB: Maximum likelihood from incomplete data via EM algorithm. J Royal Stat Soc B. 1977, 39: 1-38.Google Scholar
- Rao SQ, Xu SZ: Mapping quantitative trait loci for ordered categorical traits in four-way crosses. Heredity. 1998, 81: 214-224. 10.1038/sj.hdy.6883780.View ArticlePubMedGoogle Scholar
- Bailey NTJ: Introduction to the mathematical theory of genetic linkage. 1961, Great Britain: Oxford University PressGoogle Scholar
- Churchill GA, Doerge RK: Empirical threshold values for quantitative trait mapping. Genetics. 1994, 138: 963-971.PubMed CentralPubMedGoogle Scholar
- Visscher PM, Thompson P, Haley CS: Confidence intervals in QTL mapping by bootstrapping. Genetics. 1996, 143: 1013-1020.PubMed CentralPubMedGoogle Scholar
- Press WH, Flanner BP, Teukolsky SA, Vellerting WT: Numerical Recipes in C++: The Art of Scientific Computing. 2nd version. 2001, Cambridge University Press, New YorkGoogle Scholar
- Carbonell EA, Gerig TME, Balansard E, Asins MJ: Interval mapping in the analysis of non-additive quantitative trait loci. Biometrics. 1992, 48: 305-315. 10.2307/2532757.View ArticleGoogle Scholar
- Lander E, Green P, Abrahamson J, Barlow A, Daly MJ, Lincoln SE, Newburg L: MAPMAKER: An interactive computer package for construction primary genetic linkage maps of experimental and natural populations. Genomics. 1987, 1: 174-181. 10.1016/0888-7543(87)90010-3.View ArticlePubMedGoogle Scholar
- Huang H, Richardson TE, Carson SD, Bongarten BC: Genetic analysis of inbreeding depression in plus tree 850.55 of Pinus radiate D. Don. II Genetics of viability genes. Theor Appl Genet. 1999, 99: 140-146. 10.1007/s001220051218.View ArticleGoogle Scholar
- Pham JL, Glaszmann JC, Sano R, Barbier P, Ghesquiere A, Second G: Isozyme markers in rice: genetic analysis and linkage relationships. Genome. 1990, 33: 348-359.View ArticleGoogle Scholar
- Wu R, Ma C, Casella G: Statistical Genetics of Quantitative Traits: Linkage, Maps, and QTL. 2007, Springer, New York, 123-134.Google Scholar
- Ardlie KG, Kruglyak L, Seielstad M: Patterns of linkage disequilibrium in the human genome. Nat Rev Genet. 2002, 3: 299-309. 10.1038/nrg777.View ArticlePubMedGoogle Scholar
- Lewontin RC: The interaction of selection and linkage. I. General considerations: heterotic models. Genetics. 1964, 49: 49-67.PubMed CentralPubMedGoogle Scholar
- Cannon GB: The effects of heterozygosity and recombination on the relative fitness of experimental populations of Drosophila melanogaster. Genetics. 1963, 48: 919-942.PubMed CentralPubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.