Model-specific tests on variance heterogeneity for detection of potentially interacting genetic loci
- Ludwig A Hothorn^{1, 2}Email author,
- Ondrej Libiger^{2} and
- Daniel Gerhard^{1}
DOI: 10.1186/1471-2156-13-59
© Hothorn et al.; licensee BioMed Central Ltd. 2012
Received: 4 August 2011
Accepted: 18 July 2012
Published: 18 July 2012
Abstract
Background
Trait variances among genotype groups at a locus are expected to differ in the presence of an interaction between this locus and another locus or environment. A simple maximum test on variance heterogeneity can thus be used to identify potentially interacting single nucleotide polymorphisms (SNPs).
Results
We propose a multiple contrast test for variance heterogeneity that compares the mean of Levene residuals for each genotype group with their average as an alternative to a global Levene test. We applied this test to a Bogalusa Heart Study dataset to screen for potentially interacting SNPs across the whole genome that influence a number of quantitative traits. A user-friendly implementation of this method is available in the R statistical software package multcomp.
Conclusions
We show that the proposed multiple contrast test of model-specific variance heterogeneity can be used to test for potential interactions between SNPs and unknown alleles, loci or covariates and provide valuable additional information compared with traditional tests. Although the test is statistically valid for severely unbalanced designs, care is needed in interpreting the results at loci with low allele frequencies.
Keywords
Genetic association study Quantitative traits Interaction Variance heterogeneityAuthor’s summary
Interactions among alleles at variant sites in the genome or between alleles and the environment likely play an important role in determining complex traits such as blood pressure. However, sets of interacting loci are difficult to identify due to the large number of potential interactions that need to be tested. One approach that circumvents this difficulty is to identify loci that appear to take part in an interaction although their partners with which they interact are unknown. A SNP locus containing an allele that interacts with other alleles or the environment can be identified by the existence of a statistically significant difference in the variance of quantitative trait values among individuals who possess zero, one or two alleles at the locus. We describe an extension of Levene’s test, which was proposed to test variance heterogeneity. This new test has the advantage of providing information regarding the effect of specific alleles on variance heterogeneity, which can lead to formulating concrete, biologically relevant hypotheses about interacting alleles rather than just loci while controlling for type I error rate.
Background
Statistical association between a biallelic marker and a quantitative trait is usually tested using either a two degree of freedom F-test in the one-way analysis of variance (ANOVA) [1], or a one degree of freedom F-test in a linear regression [2]. These approaches compare the means of quantitative trait values at genotype categories associated with a SNP locus (i.e., homozygous for major allele, heterozygous and homozygous for minor allele). While ANOVA is sensitive to any global heterogeneity, linear regression test is sensitive to the presence of an additive mode of inheritance. Less attention has been given to comparing the variances in the quantitative trait values associated with different genotype categories. Recently, [3] proposed using a standard Levene test [4] to identify variance heterogeneity due to potential interaction between a given locus and another allele at the same locus, alleles at different loci or the environment. They compared three global tests, namely the Bartlett-test, a rank modification of Bartlett test and Levene test particularly for non-normal distributed variables. Differences among the variances of quantitative trait values at each genotype category (denoted ${\sigma}_{j}^{2}$ with j=0,1,2 interacting alleles) may reflect an interaction [5]. In contrast to approaches that explicitly test specific gene-gene, e.g. by Bayesian partition methods [6] or gene-environment interactions, e.g. by multiple regression methods [7], methods that assess variance heterogeneity can be used to uncover loci that are not previously known to interact.
[8] with _{ n j } quantitative trait observations _{ Y ij } per genotype j. The ${T}_{\mathrm{Levene}}^{2}$ is F-distributed with d_{f1}=Jand d_{f2}=N− J.
This test is known to be relatively robust when data are not normally distributed. However, the main disadvantage of Levene’s test is that it can only be used to determine whether the group-specific variances differ among each other. In order to obtain a biologically or clinically relevant interpretation of the results, it is often valuable to additionally determine which pairs of genotype categories in particular exhibit statistically significant variance heterogeneity.
To this end, [3] considered using three two-sample df−1 tests for the three comparisons ${\sigma}_{0}^{2}$ vs. ${\sigma}_{12}^{2}$${\sigma}_{1}^{2}$ vs. ${\sigma}_{02}^{2}$, and ${\sigma}_{2}^{2}$ vs. ${\sigma}_{01}^{2}$, where ${\sigma}_{j{j}^{\u2033}}^{2}$ denotes the variance estimator for the pooled groups j^{ j ″ }. However, these multiple tests do not control for the family-wise type I error rate α.
In this paper, we propose a Levene-type multiple contrast test, a novel approach comprised of a global test on variance heterogeneity as well as the three specific tests on pairwise variance heterogeneity using a maximum test of linear forms. We apply this test in a genome-wide fashion using a Bogalusa Heart Study dataset [9].
Methods
A Levene-type multiple contrast test
A priori it is unknown which elementary test is mostly under the alternative. Therefore, the maximum of the test statistics $max\left({T}_{j}\right)$ is used, implying a family-wise type-I-error rate αfor all of the three comparisons.
Under the above assumptions the correlations _{ ρ k k } ^{ ″ }depend only on the contrast coefficients _{ c kj }and the sample sizes _{ n j }. This approach controls the familywise error rate αand reveals a reasonable power for unbalanced designs as long as the above assumptions hold true. It provides a global decision whenever any of the contrasts is under the alternative and additionally the elementary decisions by multiplicity-adjusted p-values for the three specific comparisons. Although simultaneous confidence intervals for both differences and ratios to the average are available as well [11], they will not be recommended since a genetic interpretation for the transformed variable _{ Z ji } is difficult. Recently, simultaneous confidence intervals for the pairwise ratios of variances were proposed using a maximum test on jackknifed $\mathrm{log}\left({s}_{j}^{2}\right)$[12]. This approach would provide an alternative when modified for arbitrarily unbalanced designs.
Results and discussion
Simulation Study
Size and power comparison of the Levene test and two contrast alternatives given an dominant mode of inheritance
Levene | MC ^{ T Pairs } | MC ^{ T Ave } | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
p | N | δ= 0 | 0.5 | 1 | 0 | 0.5 | 1 | 0 | 0.5 | 1 |
0.5 | 25 | 0.025 | 0.105 | 0.228 | 0.025 | 0.107 | 0.227 | 0.023 | 0.100 | 0.224 |
50 | 0.036 | 0.398 | 0.792 | 0.036 | 0.392 | 0.790 | 0.035 | 0.403 | 0.794 | |
75 | 0.040 | 0.690 | 0.978 | 0.041 | 0.692 | 0.978 | 0.040 | 0.697 | 0.979 | |
100 | 0.039 | 0.868 | 0.999 | 0.038 | 0.869 | 0.999 | 0.039 | 0.871 | 0.999 | |
0.75 | 25 | 0.015 | 0.126 | 0.268 | 0.015 | 0.131 | 0.282 | 0.013 | 0.118 | 0.263 |
50 | 0.026 | 0.544 | 0.803 | 0.025 | 0.564 | 0.807 | 0.023 | 0.533 | 0.800 | |
75 | 0.036 | 0.842 | 0.954 | 0.035 | 0.850 | 0.955 | 0.032 | 0.831 | 0.954 | |
100 | 0.040 | 0.955 | 0.986 | 0.040 | 0.960 | 0.986 | 0.039 | 0.951 | 0.986 |
Evaluation of a real data example
The boxplots show relatively symmetric distributions of quantitative trait values in all genotype groups, thus ruling out the presence of outliers or extremely skewed distributions of trait values as sources of the observed variance heterogeneity. Furthermore, all three genotype categories contain a relatively large number of observed trait values resulting in reliable variance estimates.
P values for original and multiple contrast Levene-type tests
Trait | SNP | Test | Comparison | p-value |
---|---|---|---|---|
waist circumference | rs3760124 | Levene | global | 3.4·1^{0−07} |
MC ^{ T Ave } | ${\sigma}_{\mathrm{CC}}^{2}$ vs. ${\sigma}_{\mathrm{CT},\mathrm{TT}}^{2}$ | 2.7·1^{0−01} | ||
${\sigma}_{\mathrm{CT}}^{2}$ vs. ${\sigma}_{\mathrm{CC},\mathrm{TT}}^{2}$ | 1.5·1^{0−01} | |||
${\sigma}_{\mathrm{TT}}^{2}$ vs. ${\sigma}_{\mathrm{CC},\mathrm{CT}}^{2}$ | 9.8· 1^{0− 08} | |||
MC ^{ T Pairs } | ${\sigma}_{\mathrm{CC}}^{2}$ vs. ${\sigma}_{\mathrm{CT}}^{2}$ | 9.6·1^{0−01} | ||
${\sigma}_{\mathrm{CC}}^{2}$ vs. ${\sigma}_{\mathrm{TT}}^{2}$ | 5.3·1^{0−07} | |||
${\sigma}_{\mathrm{CT}}^{2}$ vs. ${\sigma}_{\mathrm{TT}}^{2}$ | 4.4·1^{0−07} | |||
diastolic blood pressure | rs12607553 | Levene | global | 3.0·1^{0−07} |
MC ^{ T Ave } | ${\sigma}_{\mathrm{AA}}^{2}$ vs. ${\sigma}_{\mathrm{AG},\mathrm{GG}}^{2}$ | 5.9·1^{0−01} | ||
${\sigma}_{\mathrm{AG}}^{2}$ vs. ${\sigma}_{\mathrm{AA},\mathrm{GG}}^{2}$ | 1.2· 1^{0− 07} | |||
${\sigma}_{\mathrm{GG}}^{2}$ vs. ${\sigma}_{\mathrm{AA},\mathrm{AG}}^{2}$ | 1.8·1^{0−06} | |||
MC ^{ T Pairs } | ${\sigma}_{\mathrm{AA}}^{2}$ vs. ${\sigma}_{\mathrm{AG}}^{2}$ | 3.2·1^{0−02} | ||
${\sigma}_{\mathrm{AA}}^{2}$ vs. ${\sigma}_{\mathrm{GG}}^{2}$ | 9.9·1^{0−01} | |||
${\sigma}_{\mathrm{AG}}^{2}$ vs. ${\sigma}_{\mathrm{GG}}^{2}$ | 1.9·1^{0−07} |
An example R code for testing variance heterogeneity at a single SNP is provided in the Additional files 1 and 2. The multiplicity-adjusted p-values can be estimated by means of the R package multcomp [14]. Alternatively, a SAS procedure GLIMMIX can be used for a resampling based estimation of multiplicity-adjusted p-values [15].
Conclusions
The important issue of missing heritability, which refers to the fact that common SNPs identified by genome-wide association studies as associated with a disease collectively explain only a small portion of the prevalence of this disease, may be due, in part, to the presence of unknown interactions among alleles at various SNP loci or environment, that affect the disease. The identification of such interactions is difficult, primarily because of the large number of potentially interacting pairs, trios, etc. of alleles and environmental variables that need to be tested. A feasible alternative, as suggested by [5], is to test individual loci for the evidence of their involvement in an interaction with other alleles, loci or covariates. The idea of assessing variance heterogeneity between three genotype groups at a particular SNP locus as evidence of a potential interaction is appealing for its simplicity. The Levene-type maximum contrast test proposed in this paper allows one to not only test for global variance heterogeneity, but also perform groupwise test that allows one to elucidate the effect of the individual alleles on quantitative trait variance. While this is an advantage over the standard Levene test, the price to pay is increased computing time. However, we were able to perform a genome-wide analysis of variance heterogeneity involving >500 individuals in a matter of minutes on a laptop computer. Parallelization can also be used to substantially decrease computation time requirements. R code implementing this test is available as part of the multcomp package.
Even the analysis of real data example illustrates the low specificity of the identified potential interactions. Care needs to be exercised in interpreting the results of this test in cases of low frequency variants or missing trait data, when one or more of the genotype groups contains an extremely small number of observed trait values. The issues surrounding the sensitivity and specificity of this approach in these potentially common cases is an area that needs further work.
Appendix
Size and power comparison of the Levene test and two contrast alternatives given an additive mode of inheritance
Levene | MC ^{ T Pairs } | MC ^{ T Ave } | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
p | N | δ= 0 | 0.5 | 1 | 0 | 0.5 | 1 | 0 | 0.5 | 1 |
0.5 | 25 | 0.026 | 0.102 | 0.225 | 0.026 | 0.099 | 0.209 | 0.025 | 0.102 | 0.221 |
50 | 0.034 | 0.323 | 0.689 | 0.033 | 0.298 | 0.633 | 0.033 | 0.324 | 0.695 | |
75 | 0.037 | 0.554 | 0.926 | 0.038 | 0.511 | 0.897 | 0.037 | 0.561 | 0.928 | |
100 | 0.041 | 0.728 | 0.986 | 0.041 | 0.684 | 0.979 | 0.040 | 0.733 | 0.987 | |
0.75 | 25 | 0.015 | 0.085 | 0.183 | 0.016 | 0.083 | 0.185 | 0.015 | 0.083 | 0.175 |
50 | 0.029 | 0.283 | 0.637 | 0.028 | 0.286 | 0.641 | 0.026 | 0.265 | 0.613 | |
75 | 0.038 | 0.507 | 0.888 | 0.036 | 0.504 | 0.890 | 0.034 | 0.473 | 0.874 | |
100 | 0.041 | 0.678 | 0.976 | 0.041 | 0.682 | 0.977 | 0.041 | 0.650 | 0.972 |
Size and power comparison of the Levene test and two contrast alternatives given an recessive mode of inheritance
Levene | MC ^{ T Pairs } | MC ^{ T Ave } | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
p | N | δ= 0 | 0.5 | 1 | 0 | 0.5 | 1 | 0 | 0.5 | 1 |
0.5 | 25 | 0.022 | 0.212 | 0.496 | 0.022 | 0.217 | 0.518 | 0.020 | 0.206 | 0.479 |
50 | 0.035 | 0.582 | 0.924 | 0.035 | 0.585 | 0.927 | 0.034 | 0.579 | 0.922 | |
75 | 0.037 | 0.809 | 0.989 | 0.038 | 0.811 | 0.990 | 0.040 | 0.805 | 0.989 | |
100 | 0.039 | 0.918 | 0.999 | 0.041 | 0.920 | 0.999 | 0.040 | 0.917 | 0.999 | |
0.75 | 25 | 0.016 | 0.087 | 0.172 | 0.017 | 0.089 | 0.174 | 0.013 | 0.089 | 0.176 |
50 | 0.032 | 0.206 | 0.430 | 0.030 | 0.207 | 0.433 | 0.027 | 0.217 | 0.443 | |
75 | 0.036 | 0.336 | 0.633 | 0.036 | 0.337 | 0.632 | 0.031 | 0.348 | 0.645 | |
100 | 0.039 | 0.444 | 0.772 | 0.037 | 0.444 | 0.774 | 0.037 | 0.458 | 0.782 |
Declarations
Acknowledgements
We appreciate the two anonymous reviewers for their insightful and constructive comment. The work for the first author was partly supported by the German Science Foundation grand DfG-HO1687.
Authors’ Affiliations
References
- Liu YZ, Pei YF, Guo YF, Wang L, Liu XG, Yan H, Xiong DH, Zhang YP, Levy S, Li J, Haddock CK, Papasian CJ, Xu Q, Ma JZ, Payne TJ, Recker RR, Li MD, Deng HW: Genome-wide association analyses suggested a novel mechanism for smoking behavior regulated by IL15. Mol Psychiatry. 2009, 14 (7): 668-680.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhao JH, Li MY, Bradfield JP, Zhang HT, Mentch FD, Wang K, Sleiman PM, Kim CE, Glessner JT, Hou CP, Keating BJ, Thomas KA, Garris ML, Deliard S, Frackelton EC, Otieno FG, Chiavacci RM, Berkowitz RI, Hakonarson H, Grant SFA: The role of height-associated loci identified in genome wide association studies in the determination of pediatric stature. Bmc Med Genet. 2010, 11: 96-PubMed CentralView ArticlePubMedGoogle Scholar
- Struchalin MV, Dehghan A, Witteman JCM, van Duijn, Aulchenko YS: Variance heterogeneity analysis for detection of potentially interacting genetic loci: method and its limitations. Bmc Genet. 2010, 11: 92-PubMed CentralView ArticlePubMedGoogle Scholar
- Levene H: Robust tests for equality of variances. Contributions to Probability and Statistics. Edited by: Olkin I. 1960, Palo Alto, CA: Stanford University Press, 278-292.Google Scholar
- Pare G, Cook NR, Ridker PM, Chasman DI: On the use of variance per genotype as a tool to identify quantitative trait interaction effects: A report from the women’s genome health study. Plos Genet. 2010, 6 (6): e1000981-PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang Y, Jiang B, Zhu J, Liu JS: Bayesian models for detecting epistatic interactions from genetic data. Ann Human Genet. 2011, 75 (Part 1): 183-193.View ArticleGoogle Scholar
- Erhardt V, Bogdan M, Czado C: Locating multiple interacting quantitative trait loci with the zero-inflated generalized poisson regression. Stat Appl Genet Mol Biol. 2010, 9: 26-Google Scholar
- Gastwirth JL, Gel YR, Miao WW: The impact of levene’s test of equality of variances on statistical theory and practice. Stat Sci. 2009, 24 (3): 343-360.View ArticleGoogle Scholar
- Smith EN, Chen W, Kahonen M, et al: Longitudinal Genome-Wide Association of Cardiovascular Disease Risk Factors in the Bogalusa Heart Study. Plos Genet. 2010, 6 (9): e1001094-PubMed CentralView ArticlePubMedGoogle Scholar
- Hasler M, Hothorn LA: Multiple Contrast Tests in the Presence of Heteroscedasticity. Biometrical J. 2008, 50 (5): 793-800.View ArticleGoogle Scholar
- Djira GD, Hothorn LA: Detecting Relative Changes in Multiple Comparisons with an Overall Mean. J Qual Technol. 2009, 41: 60-65.Google Scholar
- Rublik F: A note on simultaneous confidence intervals for ratio of variances. Commun Stat-Theory Methods. 2010, 39 (6): 1038-1045.View ArticleGoogle Scholar
- Hayter AJ, Liu W: The Power Function of the Studentised Range Test. Ann Stat. 1990, 18: 465-468.View ArticleGoogle Scholar
- Bretz F, Hothorn T, Westfall P: On multiple comparisons in R. R News. 2002, 2: 14-17.Google Scholar
- SAS Institute Inc. 2008: SAS/STAT® 9.2, User′s Guide. Cary NC: SAS Institute Inc.Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.