Volume 6 Supplement 1
Comparison of linkage and association strategies for quantitative traits using the COGA dataset
© McQueen et al; licensee BioMed Central Ltd 2005
Published: 30 December 2005
Genome scans using dense single-nucleotide polymorphism (SNP) data have recently become a reality. It is thought that the increase in information content for linkage analysis as a result of the denser scans will help refine previously identified linkage regions and possibly identify new regions not identifiable using the sparser, microsatellite scans. In the context of the dense SNP scans, it is also possible to consider association strategies to provide even more information about potential regions of interest. To circumvent the multiple-testing issues inherent in association analysis, we use a recently developed strategy, implemented in PBAT, which screens the data to identify the optimal SNPs for testing, without biasing the nominal significance level. We compare the results from the PBAT analysis to that of quantitative linkage analysis on chromosome 4 using the Collaborative Study on the Genetics of Alcoholism data, as released through Genetic Analysis Workshop 14.
The rapid advance of genotyping technology has resulted in a wealth of new, high-quality data that may hold promise for the further elucidation of the genetic determinants underlying complex disease. The ultimate utility of such rich data may be limited in scope by existing methods of linkage and association analysis. For example, it is somewhat unclear as to whether increasingly dense single-nucleotide polymorphism (SNP) genome scans will provide the necessary boost in power and/or information to uncover genes of modest effect size. Further, association methods will be subjected to extreme multiple comparison issues, as the number of statistical tests balloon with the vast number of available SNPs. To address the issue of multiple comparisons, recently developed screening tools implemented in PBAT  have the potential to be a powerful and unbiased strategy for genome-wide association of family studies . Briefly, the PBAT screening strategy uses the information from uninformative families (information otherwise discarded in a standard family-based association setting) to screen and select the most optimal markers for subsequent testing without biasing the nominal significance level. In this paper, we explore the utility of the PBAT screening method in comparison with quantitative linkage analysis using the Collaborative Study on the Genetics of Alcoholism (COGA) dataset, as released through the Genetic Analysis Workshop 14 (GAW14). We have the unique opportunity to use the same genetic markers for both linkage and association methods, thereby allowing for a more direct and comprehensive comparison of the two strategies.
Description of the dataset
The data provided for Problem 1 in the GAW14 dataset (COGA Study) includes genotypes from the Affymetrix GeneChip™ Human Mapping 10 K array (Affymetrix), comprises 11,555 SNPs as well as quantitative trait information for approximately 1,614 subjects from 143 families of varying size and structure. Here, we focus on the quantitative trait data from the Eyes Closed Resting electroencephalogram experiment, and in particular the measure that corresponds to the first component of a trilinear singular value decomposition of the beta2 band and bipolar electrode data (ECB21). ECB21 was shown to be approximately normally distributed with a mean of 14.53 (standard deviation = 5.5) and ranged from 4.43 to 36.06. There was no substantial skewness or kurtosis found with the ECB21 trait. We restricted our analysis to genotypes from the 786 Affymetrix SNPs on chromosome 4. We chose chromosome 4 because it has been proposed to harbor a region of linkage to the ECB21 phenotype [3–5].
Quantitative trait linkage analysis
We first performed a multipoint linkage analysis of the ECB21 phenotype using the variance components approach as implemented in MERLIN . Allele frequencies were generated using all genotyped individuals and the marker map provided by Affymetrix was used for the analysis. To assess whether linkage disequilibrium (LD) structure has influence on the linkage signal, we used HAPLOVIEW  to provide an indication of LD in the sample. We removed markers that were found to be in strong LD and re-analyzed the sample for linkage.
Quantitative trait association analysis
Each marker was tested for association with the ECB21 phenotype using the FBAT approach  as implemented in PBAT. Association testing was done assuming an additive genetic model and theoretical variance estimate. Through the computer software package PBAT, a new testing strategy has been developed to address the multiple testing issues for family-based association studies [9, 10]. The PBAT strategy can be thought of as a screening technique, whereby the most powerful allelic-phenotype association combination is selected from an entire set of allele-phenotype combinations available to the researcher. Unlike standard methods, the PBAT strategy does not bias the nominal significance level of the resulting univariate or multivariate FBAT statistic. PBAT accomplishes this by making use of the uninformative families. For example, uninformative families could refer to nuclear families where the two parents are homozygous at a particular locus. The FBAT statistic does not use uninformative families because transmission from a homozygous parent to its offspring is not random . Thus, using the uninformative families to screen for the optimal gene-phenotype combination does not bias the significance level. Specific details about the method can be found in Lange et al. [9, 10]. Briefly, the method can be broken down into six steps: 1) Select a subset of phenotypes (or one phenotype) to be tested. 2) Generate a multivariate model that describes the selected phenotype(s) as a function of the genotypes. 3) Replace the observed genotypes for the informative families with their expected genotypes conditional on parental genotypes. 4) Estimate the effect-size parameters from the model in step 3. 5) Estimate the power of the selected phenotype-genotype using the conditional power approach . 6) Use the standard univariate FBAT approach on the phenotype-genotype combination that has optimal power from step 5. For the present analysis, we made use of PBAT's screening strategy to select the five most powerful SNPs to be tested and Bonferroni-adjusted the resulting FBAT p-values for five tests. The rationale for selecting the five most powerful SNPs was assessed via simulation studies conduced by Van Steen et al.  that suggest that this is the optimal strategy in the context of PBAT screening on this scale.
Linkage analysis results
Association analysis results
Ten SNPs on chromosome 4 with the smallest p-values. The ten SNPs that had the smallest p-values (unadjusted) associated with ECB21 on chromosome 4 as estimated by PBAT. Corresponding MERLIN LOD scores are also shown.
No. of uninformative families
Five most powerful SNPs on Chromosome 4. The five SNPs with the highest estimated power by PBAT screening on chromosome 4, and their corresponding p-values. Corresponding MERLIN LOD scores are also reported.
No. of Uninformative Families
Using Affymetrix SNPs from the COGA dataset, we identified a region that is linked (LOD = 3.55) to the ECB21 phenotype at approximately 70 cM. This region had been previously identified in the COGA dataset by Reich et al.  showing a maximum LOD score of 2.50 using affected (alcoholism diagnosis) sibling pair methodology and microsatellite markers. In addition, we were able to replicate the approximate region of linkage as that found by Porjesz et al.  using the same EEG measurement as used for this analysis. Porjesz et al. reported a higher LOD score (over 5.0) than we report here (3.55), however we did not adjust our analysis for age and sex as was done by Porjesz et al.
Using the screening, we also identified two SNPs that are potentially associated with the ECB21 phenotype at approximately 35 cM (p = 0.0081 and 0.0085, respectively). As expected, testing each of the 786 SNPs for association resulting in a severe multiple-testing issue, as none of the SNPs across chromosome 4 was found to be statistically significant using either a Bonferroni correction or FDR methods. However, using PBAT's screening strategy allowed us to reduce the number of tests. We chose to test the top five most powerful SNPs as identified by PBAT, and found two SNPs significantly associated with ECB21 at the 5% significance level (after Bonferroni-adjustment for five tests). These two SNPs are physically very close to each other (~2 kb), and when tested together using PBAT's haplotype analysis function, the resulting 2 SNP haplotype maintained its significance and relative power (data not shown).
Interestingly, the selected SNPs were not found to be located directly in the linkage region, as the significant SNPs are approximately 30–40 cM from the maximum LOD score. Furthermore, the LOD scores corresponding to the selected SNPs were 0. The discrepancy in these findings may be explained in a number of ways, particularly when one considers that the alternative hypothesis of the FBAT strategy is the presence of linkage and association. First, it is possible, albeit unlikely, that the association approach was able to identify SNPs that are in LD with the linkage region. Second, it has been suggested that association analysis may be more powerful to detect genes of relatively smaller effect sizes . Therefore, it is conceivable that the association strategy identified a novel region that was not detectable by linkage analysis. Third, the association could be completely due to chance, resulting in a false positive for that region. It is also interesting that PBAT did not find SNPs that were statistically significant in the linkage region. However, given the alternative hypothesis using the FBAT approach (presence of linkage and association) it is possible that the SNPs (in relatively low LD) displaying linkage are not associated with the underlying causal locus. It should also be noted that the linkage analysis conducted in the present study was not optimally performed and therefore, we did not maximize the linkage signal. However, the intent of the present study was to compare the two strategies by highlighting key similarities and differences, and not necessarily providing evidence in support of one strategy over the other. Furthermore, we propose that collectively, both strategies may prove useful in high-density genome-wide scans.
We compared the similarities and differences between linkage analysis and PBAT's approach to association analysis, using the same quantitative trait and using the same marker set. In this brief exploration, we did not find that linkage and association necessarily provided concordant results. Nonetheless, in the context of the high-density SNP scans, we feel that utilizing new strategies for association testing may provide additional information not otherwise discovered using linkage analysis alone.
Collaborative Study on the Genetics of Alcoholism
False discovery rate
Genetic Analysis Workshop
MBM and JS are supported by the National Research Service Award, Training Program in Psychiatric Epidemiology and Biostatistics (T32 MH17119).
- Lange C, DeMeo D, Silverman EK, Weiss ST, Laird NM: PBAT: tools for family-based association studies. Am J Hum Genet. 2004, 74: 367-369. 10.1086/381563.PubMed CentralView ArticlePubMedGoogle Scholar
- Van Steen K, McQueen MB, Herbert A, Raby B, Lyon H, Demeo DL, Murphy A, Su J, Datta S, Rosenow C, Christman M, Silverman EK, Laird NM, Weiss ST, Lange C: Genomic screening and replication using the same data set in family-based association testing. Nat Genet. 2005, 37: 683-691. 10.1038/ng1582.View ArticlePubMedGoogle Scholar
- Reich T, Edenberg HJ, Goate A, Williams JT, Rice JP, Van Eerdewegh P, Foroud T, Hesselbrock V, Schuckit MA, Bucholz K, Porjesz B, Li TK, Conneally PM, Nurnberger JI, Tischfield JA, Crowe RR, Cloninger CR, Wu W, Shears S, Carr K, Crose C, Willig C, Begleiter H: Genome-wide search for genes affecting the risk for alcohol dependence. Am J Med Genet. 1998, 81: 207-215. 10.1002/(SICI)1096-8628(19980508)81:3<207::AID-AJMG1>3.0.CO;2-T.View ArticlePubMedGoogle Scholar
- Porjesz B, Begleiter H, Wang K, Almasy L, Chorlian DB, Stimus AT, Kuperman S, O'Connor SJ, Rohrbaugh J, Bauer LO, Edenberg HJ, Goate A, Rice JP, Reich T: Linkage and linkage disequilibrium mapping of ERP and EEG phenotypes. Biol Psychol. 2002, 61: 229-248. 10.1016/S0301-0511(02)00060-1.View ArticlePubMedGoogle Scholar
- Porjesz B, Begleiter H, Wang K, Almasy L, Chorlian DB, Stimus AT, Kuperman S, O'Connor SJ, Rohrbaugh J, Bauer LO, Edenberg HJ, Goate A, Rice JP, Reich T: Linkage disequilibrium between the beta frequency of the human EEG and a GABAA receptor gene locus. Biol Psychol. 2002, 61: 229-248. 10.1016/S0301-0511(02)00060-1.View ArticlePubMedGoogle Scholar
- Abecasis GR, Cherny SS, Cookson WO, Cardon LR: Merlin-rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet. 2002, 30: 97-101. 10.1038/ng786.View ArticlePubMedGoogle Scholar
- Barrett JC, Fry B, Maller J, Daly MJ: Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 2005, 21: 263-265. 10.1093/bioinformatics/bth457.View ArticlePubMedGoogle Scholar
- Laird N, Horvath S, Xu X: Implementing a unified approach to family based tests of association. Genet Epidemiol. 2000, 19 (Suppl 1): S36-S42. 10.1002/1098-2272(2000)19:1+<::AID-GEPI6>3.0.CO;2-M.View ArticlePubMedGoogle Scholar
- Lange C, DeMeo D, Silverman E, Weiss S, Laird NM: Using the noninformative families in family-based association tests: a powerful new testing strategy. Am J Hum Genet. 2003, 73: 801-811. 10.1086/378591.PubMed CentralView ArticlePubMedGoogle Scholar
- Lange C, Lyon H, DeMeo D, Raby BA, Silverman E, Weiss S: A new powerful non-parametric two-stage approach for testing multiple phenotypes in family-based association studies. Hum Hered. 2003, 56: 10-17. 10.1159/000073728.View ArticlePubMedGoogle Scholar
- Lange C, Laird NM: On a general class of conditional tests for family-best association studies in genetics: The asymptotic distribution, the conditional power, and optimality considerations. Genet Epidemiol. 2002, 23: 165-180. 10.1002/gepi.209.View ArticlePubMedGoogle Scholar
- Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B. 1995, 57: 289-300.Google Scholar
- Risch NJ: Searching for genetic determinants in the new millennium. Nature. 2000, 405: 847-856. 10.1038/35015718.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.