Volume 6 Supplement 1
Comparison of marker types and map assumptions using Markov chain Monte Carlo-based linkage analysis of COGA data
© Sieh et al; licensee BioMed Central Ltd 2005
Published: 30 December 2005
We performed multipoint linkage analysis of the electrophysiological trait ECB21 on chromosome 4 in the full pedigrees provided by the Collaborative Study on the Genetics of Alcoholism (COGA). Three Markov chain Monte Carlo (MCMC)-based approaches were applied to the provided and re-estimated genetic maps and to five different marker panels consisting of microsatellite (STRP) and/or SNP markers at various densities. We found evidence of linkage near the GABRB1 STRP using all methods, maps, and marker panels. Difficulties encountered with SNP panels included convergence problems and demanding computations.
Our aims were to investigate 1) the utility of single-nucleotide polymorphisms (SNPs) versus microsatellites (STRPs), and 2) the impact of map assumptions on linkage analysis. We chose to focus our analyses on the COGA ECB21 trait and chromosome 4 because previous studies [1, 2] had reported significant evidence for linkage of the electroencephalogram (EEG) beta wave to chromosome 4. Multipoint linkage analysis of the full pedigree structures was performed by using MCMC techniques to implement allele-sharing, parametric LOD score, and Bayesian analysis approaches.
Trait definition and segregation analyses
A multivariate polygenic model was used to obtain maximum likelihood estimates of the heritabilities and genetic correlations of ECB21 and 12 other EEG measurements . On the basis of the results, ECB21 and TTTH3 were selected for further study. Early analyses of TTTH3 showed little evidence of linkage to chromosome 4, so subsequent analyses focused only on ECB21. Oligogenic segregation analysis  of ECB21, adjusting for age and gender, revealed two quantitative trait locus (QTL) models. The model with the highest posterior probability provided stronger evidence of linkage to chromosome 4 and was used in subsequent parametric LOD score analysis of the quantitative trait, ECB21_Q, preadjusted for age and gender. We created a dichotomous trait, ECB21_D, by defining ECB21_Q ≥ 3 as 'affected'. This cutpoint maximized the difference between the penetrances of the high- versus low-risk genotypes based on the estimated genotype effects from the most likely QTL model.
All 275 Illumina SNPs on chromosome 4 and 550 consecutive Affymetrix SNPs spanning STRPs 2–12 on chromosome 4 were selected. Among SNPs with identical meiotic map positions, the SNP with the largest minor allele frequency was retained, leaving a relatively sparse panel of 140 Illumina SNPs with an average spacing of ~1.5 cM (ILMN_1.5) and a dense panel of 476 Affymetrix SNPs with an average spacing of 0.3 cM (AFFY_0.3) for further analysis. A subset of 97 Affymetrix SNPs (AFFY_1.5) was selected by requiring an empirically determined minimum distance of 1.1 cM between SNPs, starting from the first SNP, to achieve a similar average density as ILMN_1.5. SNPs were interpolated onto the COGA STRP map by pegging the two flanking SNPs to each STRP and interpolating the intervening SNPs based upon the proportional distances in the corresponding intervals on the COGA and provided SNP maps.
Genetic maps were re-estimated from the COGA data using a hybrid algorithm, based on MCMC-EM (expectation maximization) and stochastic approximation for STRPs and MCMC-EM for SNPs, to find the maximum likelihood estimates of the recombination fractions. Sex-averaged and sex-specific maps were re-estimated using all 17 STRPs on chromosome 4, and a sex-averaged map was estimated using STRPs 2–12 plus AFFY_0.3. Haldane map distances were used in all analyses and figures.
Linkage analyses of the ECB21 traits on chromosome 4 used three MCMC-based methods from the MORGAN and Loki software packages . First, a MORGAN IBD-scoring program (lm_ibdtest) was used to analyze ECB21_D. This program obtains MCMC estimates of the allele-sharing statistic Spairs  and determines significance levels with a permutation test rather than relying upon normality assumptions. Second, a MORGAN parametric LOD score program (lm_markers) was used to analyze ECB21_D (not shown) and ECB21_Q using parameters from the segregation model for ECB21_Q and the associated penetrances and allele frequencies for ECB21_D. Third, an oligogenic linkage analysis approach (Loki) was used to analyze ECB21_Q; results are expressed as Bayes factors, or the posterior:prior odds that a QTL exists in a given 2cM region. A 50:50 ratio of locus to meiosis block Gibbs sampling  was used in all analyses. Initial starting configurations were obtained by using the locus sampler independently on each locus. We performed single-marker analyses with each of the 17 STRPs on chromosome 4. Multipoint analyses used five marker panels: 17 STRPs; AFFY_0.3; STRPs 2–12 plus AFFY_0.3; ILMN_1.5; and AFFY_1.5.
To evaluate the effects of the real chromosome 4 STRP data and provided map on type I error, 1,000 replicates of an unlinked quantitative trait, based on the ECB21_Q model, were simulated on the COGA pedigrees. The simulated trait was then dichotomized using the same cut point as for ECB21_D. For comparison, true null datasets were created by pairing each of the 1,000 unlinked trait replicates with a single set of unlinked markers, simulated based on the chromosome 4 STRP allele frequencies and map. Spairs was computed at each marker in each replicate using lm_ibdtest.
Trait definition and segregation analyses
The polygenic analysis estimated a narrow-sense heritability of 0.61 for ECB21_Q and genetic correlation of 0.47 between ECB21 and TTTH3. Oligogenic segregation analysis of ECB21_Q indicated the existence of at least one QTL. The estimated parameters for the most likely QTL model were: frequency of 0.411 for the minor allele "A", genotype means μaa = -1.22, μAa= -1.14, μAA = 5.79, and residual variance of 22.0. Penetrances for ECB21_D were 19%, 19%, and 73% for the aa, Aa, and AA genotypes, respectively.
Three different MCMC-based multipoint methods gave evidence in the COGA STRP data for linkage of ECB21 to STRP 4 on chromosome 4. We also found weaker evidence of linkage near STRP 10. Comparison of sex-averaged and sex-specific STRP maps suggested that results may be robust to map-misspecification in the presence of strong evidence for linkage. However, the investigation of map assumptions may be important in elucidating weak linkage signals, especially in chromosomal regions with substantial male-female map differences. Map estimation using SNP data led to substantial expansion of genetic distances compared to maps estimated from STRP data, suggesting possible undetected SNP genotype errors or effects of linkage disequilibrium. Our analyses of simulated null datasets with an unlinked trait and real STRP data indicated that some regions of chromosome 4, including STRP 4, may be prone to false-positive linkage signals, and that this tendency persists even using maps estimated from the data. Possible explanations for false-positive results include genotype error or allele frequency misspecification.
Multipoint analyses using STRPs, SNPs, or a combination of STRPs and SNPs yielded comparable evidence of linkage to the chromosome 4 region with the strongest signal. The signal strength was not greater for the dense versus sparse SNP panels. Furthermore, localization and interpretation of linkage signals for the dense SNP panel were complicated by noisy results, which could reflect MCMC mixing problems and/or genotype error. Multipoint analyses using sparse SNP panels produced smoother LOD score curves than the dense SNPs. These results suggest that increasing the density of SNP panels beyond an average spacing of 1.5 cM does not substantially increase the evidence for linkage in the COGA dataset, which consists of moderate-size pedigrees with relatively complete genotype data. Additional studies will be needed to determine the optimal density for SNP panels in other datasets. Our analyses with current MCMC approaches indicate that, while useable with dense SNPs in limited chromosome regions with medium-size pedigrees, long runs are needed to produce stable linkage analysis results. Run times may prohibit the use of dense SNP panels for whole-genome scans with current MCMC analysis programs. MCMC-based methods are among the best tools now available for the analysis of large pedigrees, numerous markers, and complex traits. Further development of these methods in order to accommodate dense SNP panels in the context of large pedigrees would be of value.
Collaborative Study on the Genetics of Alcoholism
Genetic Analysis Workshop
Markov chain Monte Carlo
Quantitative trait locus
Short tandem repeat polymorphism
Supported by NIH grants GM46255, HD35465, HD33812, AG14382, and AG05136.
- Porjesz B, Almasy L, Edenberg HJ, Wang K, Chorlian DB, Foroud T, Goate A, Rice JP, O'Connor SJ, Rohrbaugh J, Kuperman S, Bauer LO, Crowe RR, Schuckit MA, Hesselbrock V, Conneally PM, Tischfield JA, Li TK, Reich T, Begleiter H: Linkage disequilibrium between the beta frequency of the human EEG and a GABAA receptor gene locus. Proc Natl Acad Sci USA. 2002, 99: 3729-3733. 10.1073/pnas.052716399.PubMed CentralView ArticlePubMedGoogle Scholar
- Ghosh S, Begleiter H, Porjesz B, Chorlian DB, Edenberg HJ, Foroud T, Goate A, Reich T: Linkage mapping of beta 2 EEG waves via non-parametric regression. Am J Med Genet. 2003, 118B: 66-71. 10.1002/ajmg.b.10057.View ArticlePubMedGoogle Scholar
- Sung YJ, Dawson G, Munson J, Estes A, Schellenberg GD, Wijsman EM: Genetic investigation of quantitative traits related to autism: use of multivariate polygenic models with ascertainment adjustment. Am J Hum Genet. 2005, 76: 68-81. 10.1086/426951.PubMed CentralView ArticlePubMedGoogle Scholar
- Heath SC: Markov chain Monte Carlo segregation and linkage analysis for oligogenic models. Am J Hum Genet. 1997, 61: 748-760.PubMed CentralView ArticlePubMedGoogle Scholar
- Pedigree Analysis for Genetics. [http://www.stat.washington.edu/thompson/Genepi/pangaea.shtml]
- Whittemore AS, Halpern J: A class of tests for linkage using affected pedigree members. Biometrics. 1994, 50: 118-127. 10.2307/2533202.View ArticlePubMedGoogle Scholar
- George AW, Thompson EA: Discovering disease genes: multipoint linkage analysis via a new Markov chain Monte Carlo approach. Stat Sci. 2003, 18: 515-531. 10.1214/ss/1081443233.View ArticleGoogle Scholar
- Kong A, Gudbjartsson DF, Sainz J, Jonsdottir GM, Gudjonsson SA, Richardsson B, Sigurdardottir S, Barnard J, Hallbeck B, Masson G, Shlien A, Palsson ST, Frigge ML, Thorgeirsson TE, Gulcher JR, Stefansson K: A high-resolution recombination map of the human genome. Nat Genet. 2002, 31: 241-247.PubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.