Skip to main content

The role of parametric linkage methods in complex trait analyses using microsatellites

Abstract

Many investigators of complexly inherited familial traits bypass classical segregation analysis to perform model-free genome-wide linkage scans. Because model-based or parametric linkage analysis may be the most powerful means to localize genes when a model can be approximated, model-free statistics may result in a loss of power to detect linkage. We performed limited segregation analyses on the electrophysiological measurements that have been collected for the Collaborative Study on the Genetics of Alcoholism. The resulting models are used in whole-genome scans. Four genomic regions provided a model-based LOD > 2 and only 3 of these were detected (p < 0.05) by a model-free approach. We conclude that parametric methods, using even over-simplified models of complex phenotypes, may complement nonparametric methods and decrease false positives.

Background

Although model-based or parametric linkage analysis on extended families is generally considered the most powerful means to localize genes when a model can be approximated, the requirement for reasonable model parameter values is often perceived to be unattainable for complex traits. As a result, the potential advantages of the method are frequently passed over in favor of "model-free", nonparametric statistics that may be less powerful [1, 2]. The rejection of parametric methods increases the possibility of missing linkage signals. It is generally recognized that failure to detect quantitative trait loci (QTLs) on an initial scan is more problematic than false positives. The latter should be excluded by follow-up analyses, but a false negative region may not be pursued. Thus, consideration of how to avoid false negatives is warranted. To date, five linkage studies have been performed on the electrophysiological measurements (electroencephalogram, EEG and event-related potential, ERP) that have been collected for the Collaborative Study on the Genetics of Alcoholism (COGA) dataset [3–7]. All of these linkage studies are based on identity-by-descent (IBD) allele sharing status, a "model-free" approach; four used variance decomposition as incorporated in SOLAR [3–6] and one used a regression approach [7]. The purpose of the current paper is to examine the utility of modeling the familial transmission of EEG and ERP phenotypes followed by 3-point, model-based linkage analysis of these traits versus model-free methods.

Methods

Phenotypes

We evaluated 13 quantitative traits representing neurological function in the COGA dataset (143 families). These included one EEG phenotype, ecb21, and 3 sets of 4 related ERP phenotypes (1 = FP1, 2 = FZ, 3 = CZ, and 4 = PZ channels), ttth1, tth2, tth3, and ttth4; ttdt1, ttd2, ttd3, and ttdt4; and ntt1, ntt2, ntt3, and ntt4. Linear regression was used to adjust for the effects of age and sex. The variable ecb21 was adjusted for sex, age, age2, and age3, and the remaining variables were adjusted for sex, age, and age2; all effects were highly significant. Additionally, the regression residuals for each of the 143 families were re-centered on zero to remove any family-specific effects on the mean.

Modeling

Complex segregation analyses (CSA), using PAP v. 5, was used to estimate co-dominant (no overdominance) and dominant mixed models for each adjusted trait [8, 9]. Multiple trials with random starting parameter values ensured maximum likelihood convergence for each model.

Linkage analyses

Using the more parsimonious CSA model for each trait, we performed a two-marker genomic scan using LINKMAP [10, 11]. Thus, we used each marker twice, except the p and q terminal markers of each chromosome, which were used once. Sixty likelihoods were calculated for each 2-marker set, 20 from theta = 0.5 to theta = 0 relative to marker 1, 20 between markers, and 20 from theta = 0 (relative to marker 2) to theta = 0.5. The possibility of linkage heterogeneity was evaluated using heterogeneity LOD scores (hLOD) using HOMOG [12]. A 2-point model-free scan was also done for each of the 13 phenotypes using the computer program MERLIN [13]. This program calculates an allele-sharing statistic, the Kong and Cox LOD (KC-LOD), and its statistical significance, p(KC). Eleven pedigrees too large for MERLIN analysis were trimmed for MERLIN analysis but not for the LINKAGE analysis. However, of the 44 individuals trimmed, only 5 had any measured phenotype. Regions for which a LOD score > 2 was detected were further evaluated by multipoint model-free analysis for comparison purposes.

Results

For 9 of 13 traits the co-dominant models either resulted in over-dominance or failed to converge; these were not considered further. For the remaining traits (ecb21, ntth1, ntth3, and ttdt2) the dominant model was found to be more parsimonious (Table 1). In no case was the resulting polygenic inheritance greater than 0.0001. In the 13 2-point scans MERLIN found 29 linkages with p < 0.05 (data not shown). Using the Mendelian dominant genetic model for each trait, the parametric 2-marker genome scans detected 4 regions with a LOD score of at least 2.0 for 3 of the 13 traits (Table 2). None of these linkages showed significant heterogeneity. One of these regions, on chromosome 4, was not detected with a p < 0.05 using the model-free approach.

Table 1 Genetic models from PAP segregation analysis for adjusted traits.
Table 2 Four regions with LOD score > 2.0.

Discussion

Model-based LOD score linkage has proven effective in localizing genes associated with numerous disease-related traits which, generally, exhibit Mendelian patterns of inheritance and for which the parameters (mode of transmission, gene frequency, and penetrance or quantitative effect size) have been estimated. However, the models available in current linkage software are overly simple for complex traits and the utility of model-based methods under these limitations is unclear. An incorrect model may lead to loss of power in the presence of true linkage as well as an overestimation of recombination [14]. Several strategies have been suggested to overcome these limitations of model-based linkage in complex traits. When the 'true' genetic model is unknown, maximizing the LOD score over several modes of inheritance, usually a dominant one and a recessive one, has been proposed [14]. Additionally, nonparametric or "model-free" linkage analysis methods are often used. Model-free methods, however, often put constraints on pedigree size and, overall, may have less power than model-based analyses, even for complex traits [15]. Thus, if the true model can be approximated, a model-based approach is desirable, especially in a genome scan, where exact specification of recombination values is of secondary concern.

Our findings indicate that model-based linkage of complex traits may add information not furnished by nonparametric analyses. Our two-marker parametric linkage results suggest four regions with LOD > 2 for three traits. The multipoint nonparametric analysis detected three of these regions with a p < 0.05 but did not detect the chromosome 4 region at this probability level. Two of the regions identified with LOD > 2 using parametric linkage appear to have been previously detected in published analyses of these data, while two others were not. Both previously detected regions were found by the program, SOLAR, which calculates a likelihood ratio derived LOD score by comparing a model in which the additive genetic variance at a specified map position is compared to one in which this component is set to zero. The chromosome 3 region (max LOD = 2.01 for ttdtla), near D3S2406-GATA128C02-D3S2459, is the same location in which Porjesz et al. [6], obtained a LOD of 2.59 for the N1 component, P4 lead trait. This was a different ERP component and lead compared to the P3-FP1 measurement giving our chromosome 3 linkage. Although Porjesz et al. [6] included the P3-FP1 phenotype in their study, they report linkage only on chromosome 5 (LOD 2.64). We did not have the N1 component available.

On chromosome 4 we obtained a LOD of 2.08 for ecb21 with D4S1558-D4S2361 whereas Williams et al. [4] found a peak multipoint LOD score of 1.51 (bivariate LOD = 2.65) at D4S1628 for the ERP phenotype, P3-CZ. Although the markers D4S2361 and D4S1628 are separated by 27 cM (Marshfield sex-averaged map), these two findings may represent distinct signals. Using Markov chain Monte Carlo (MCMC) methods, Sieh et al. [16] obtained a strong linkage signal for ecb21 at GABRB1 (51.4 cM on the Kosambi map) with a separate, weaker linkage at D4S1558. Using variance-components linkage analysis Lin et al. [17] obtained a LOD peak of 1.96 at 108 cM with ecb21 that apparently shifted to 95 cM (adjacent to D4S1559), LOD 4.38, when ALDX1 was added to create a bivariate trait. The scores on chromosome 9 for ttth1 (LOD = 2.30, GATA175H06-D9S925) and on chromosome 12 for ecb21 (LOD = 2.61, D12S1090-D12S390 at 9p22.2 and LOD = 2.17, D12S390-D12S398 at 12q13.13) appear to reference novel loci for EEG and ERP phenotypes. Indeed, the two adjacent chromosome 12 scores were the highest in our 13 genome scans, the largest associated with a p-value of 0.0005. Some support for linkage on this chromosome is suggested by the MERLIN results in that, of the 4 regions analyzed by multipoint linkage, chromosome 12 gave the highest KC-LOD, 0.9, and the smallest p(KC), 0.02.

Although, in this instance, our procedure resulted in the identification of a possibly 'real' linkage that was missed by standard nonparametric analysis, the use of nonparametric linkage methods shouldn't be viewed as inferior nor should our CSA be viewed as sufficient. The nonparametric linkage tests found candidate loci that our model-based procedure did not and our CSA was limited to a single major gene and forced convergence within a restricted sample space. Complex traits with multiple genetic and environmental effects will often result in no reasonable model. We found linkages for only four of thirteen modeled traits. Failure to detect linkage may have been due to unaccounted sources of familial correlation (e.g., environmental) or to modeling a single major gene when one did not exist. Also, when linkage is found using an overly-simple model, sensitivity testing to evaluate parameter values, e.g., marker allele frequencies and QTL penetrance/quantitative effect, can assess potential misspecification. However, overall, we recommend obtaining maximum likelihood genetic models from CSA whenever possible; by definition no 'truer' trait models can be obtained, given the single major gene restraint conditions under which we modeled.

Conclusion

Our results indicate that model-based linkage procedures using simple models from CSA may detect candidate loci for complex traits that are not revealed by commonly used model-free techniques. Parametric methods that allow more complex modeling, such as MCMC methods, are being implemented [16, 18, 19]. However, the older model-based methods have been shown to complement MCMC approaches in complex trait linkage analyses and, in fact, may be advantageous for initial screening [19]. Until procedures for generating and utilizing complex trait linkage models are more widely available, parametric analyses under simpler models and nonparametric methods might be better used in a complementary manner.

Abbreviations

COGA:

Collaborative Study on the Genetics of Alcoholism

CSA:

Complex segregation analysis

EEG:

Electroencephalogram

ERP:

Event-related potential

GAW:

Genetic Analysis Workshop

HLOD:

Heterogeneity LOD

IBD:

Identity by descent

KC-LOD:

Kong-Cox LOD

MCMC:

Markov chain Monte Carlo

QTL:

Quantitative trait locus

References

  1. Amos CI, de Andrade M: Genetic linkage methods for quantitative traits. Stat Methods Med Res. 2001, 10: 3-25. 10.1191/096228001677031143.

    Article  CAS  PubMed  Google Scholar 

  2. Rebai A: Linkage analysis for complex diseases: a new life for an old method. Arch Inst Pasteur Tunis. 2000, 77: 25-35.

    CAS  PubMed  Google Scholar 

  3. Begleiter H, Porjesz B, Reich T, Edenberg HJ, Goate A, Blangero J, Almasy L, Foroud T, Van Eerdewegh P, Polich J, Rohrbaugh J, Kuperman S, Bauer LO, O'Connor SJ, Chorlian DB, Li TK, Conneally PM, Hesselbrock V, Rice JP, Schuckit MA, Cloninger R, Nurnberger J, Crowe R, Bloom FE: Quantitative trait loci analysis of human event-related brain potentials: P3 voltage. Electroencephalogr Clin Neurophysiol. 1998, 108: 244-250. 10.1016/S0168-5597(98)00002-1.

    Article  CAS  PubMed  Google Scholar 

  4. Williams JT, Begleiter H, Porjesz B, Edenberg HJ, Foroud T, Reich T, Goate A, Van Eerdewegh P, Almasy L, Blangero J: Joint multipoint linkage analysis of multivariate qualitative and quantitative traits. II. Alcoholism and event-related potentials. Am J Hum Genet. 1999, 65: 1148-1160. 10.1086/302571.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  5. Almasy L, Porjesz B, Blangero J, Goate A, Edenberg HJ, Chorlian DB, Kuperman S, O'Connor SJ, Rohrbaugh J, Bauer LO, Foroud T, Rice JP, Reich T, Begleiter H: Genetics of event-related brain potentials in response to a semantic priming paradigm in families with a history of alcoholism. Am J Hum Genet. 2001, 68: 128-135. 10.1086/316936.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  6. Porjesz B, Begleiter H, Wang K, Almasy L, Chorlian DB, Stimus AT, Kuperman S, O'Connor SJ, Rohrbaugh J, Bauer LO, Edenberg HJ, Goate A, Rice JP, Reich T: Linkage and linkage disequilibrium mapping of ERP and EEG phenotypes. Biol Psychol. 2002, 61: 229-248. 10.1016/S0301-0511(02)00060-1.

    Article  PubMed  Google Scholar 

  7. Ghosh S, Begleiter H, Porjesz B, Chorlian DB, Edenberg HJ, Foroud T, Goate A, Reich T: Linkage mapping of beta 2 EEG waves via non-parametric regression. Am J Med Genet. 2003, 118B: 66-71. 10.1002/ajmg.b.10057.

    Article  PubMed  Google Scholar 

  8. Hasstedt S, Cartwright P: PAP: pedigree analysis package. 1981, Salt Lake City: Department of Medical Biophysics and Computing, University of Utah

    Google Scholar 

  9. Hasstedt SJ: A mixed-model likelihood approximation on large pedigrees. Comput Biomed Res. 1982, 15: 295-307. 10.1016/0010-4809(82)90064-7.

    Article  CAS  PubMed  Google Scholar 

  10. Lathrop GM, Lalouel JM, Julier C, Ott J: Multilocus linkage analysis in humans: detection of linkage and estimation of recombination. Am J Hum Genet. 1985, 37: 482-498.

    PubMed Central  CAS  PubMed  Google Scholar 

  11. Cotting RW, Idury RM, Schaffer AA: Faster sequential genetic linkage computations. Am J Hum Genet. 1993, 53: 252-263.

    Google Scholar 

  12. Ott J: Analysis of Human Genetic Linkage. 1991, Baltimore: Johns Hopkins University Press, revised

    Google Scholar 

  13. Abecasis GR, Cherny SS, Cookson WO, Cardon LR: Merlin–rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet. 2002, 30: 97-101. 10.1038/ng786.

    Article  CAS  PubMed  Google Scholar 

  14. Clerget-Darpoux F, Bonaiti-Pellie C, Hochez J: Effects of misspecifying genetic parameters in LOD score analysis. Biometrics. 1986, 42: 393-399. 10.2307/2531059.

    Article  CAS  PubMed  Google Scholar 

  15. Abreu PC, Greenberg DA, Hodge SE: Direct power comparisons between simple LOD scores and NPL scores for linkage analysis in complex diseases. Am J Hum Genet. 1999, 65: 847-857. 10.1086/302536.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  16. Sieh W, Basu S, Fu AQ, Rothstein JH, Scheet PA, Stewart W, Sung Y, Thompson EA, Wijsman E: Comparison of marker types and map assumptions using Marker chain Monte Carlo-based linkage analysis of COGA data. BMC Genet. 2005, 6 (Suppl 1): S11-10.1186/1471-2156-6-S1-S11.

    Article  PubMed Central  PubMed  Google Scholar 

  17. Lin J-P, Wu C: Bivariate genome scans incorporating factor and principal component analyses to identify common genetic components of alcoholism, event-related potential, and electroencephalogram phenotypes. BMC Genet. 2005, 6 (Suppl 1): S114-10.1186/1471-2156-6-S1-S114.

    Article  PubMed Central  PubMed  Google Scholar 

  18. Daw EW, Heath SC, Lu Y: Single-nucleotide polymorphism versus microsatellite markers in a combined linkage and segregation analysis of a quantitative trait. BMC Genet. 2005, 6 (Suppl 1): S32-10.1186/1471-2156-6-S1-S32.

    Article  PubMed Central  PubMed  Google Scholar 

  19. Badzioch MD, Igo RP, Gagnon F, Brunzell JD, Krauss RM, Motulsky AG, Wijsman EM, Jarvik GP: LDL particle size loci in familial combined hyperlipidemia: evidence for multiple loci from a genome scan. Arterioscler Thromb Vasc Biol. 2004, 24: 1942-1950. 10.1161/01.ATV.0000143499.09575.93.

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael D Badzioch.

Additional information

Authors' contributions

MDB formulated the study question, did the analyses, and wrote the manuscript; ELG and GPJ provided critical input on the analyses and manuscript.

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Badzioch, M.D., Goode, E.L. & Jarvik, G.P. The role of parametric linkage methods in complex trait analyses using microsatellites. BMC Genet 6 (Suppl 1), S48 (2005). https://doi.org/10.1186/1471-2156-6-S1-S48

Download citation

  • Published:

  • DOI: https://doi.org/10.1186/1471-2156-6-S1-S48

Keywords