 Proceedings
 Open Access
 Published:
Smoothing of the bivariate LOD score for nonnormal quantitative traits
BMC Genetics volume 6, Article number: S111 (2005)
Abstract
Variance component analysis provides an efficient method for performing linkage analysis for quantitative traits. However, type I error of variance componentsbased likelihood ratio testing may be affected when phenotypic data are nonnormally distributed (especially with high values of kurtosis). This results in inflated LOD scores when the normality assumption does not hold. Even though different solutions have been proposed to deal with this problem with univariate phenotypes, little work has been done in the multivariate case. We present an empirical approach to adjust the inflated LOD scores obtained from a bivariate phenotype that violates the assumption of normality.
Using the Collaborative Study on the Genetics of Alcoholism data available for the Genetic Analysis Workshop 14, we show how bivariate linkage analysis with leptokurtotic traits gives an inflated type I error. We perform a novel correction that achieves acceptable levels of type I error.
Background
Variance component methods are very well suited to normally distributed phenotypes. However, when the phenotype under study is not normal, these methods tend to increase the type I error [1]. A robust LOD score correction has been developed to solve this problem for univariate phenotypes [2, 3]. This approach takes advantage of a remarkable result: the distribution of the likelihood ratio statistic under model misspecification is equal to a constant times a variate [4]. Therefore, a robust alternative to the likelihood ratio statistic is: Λ_{ R }= k Λ and the analogous robust LOD score is: LOD_{R} = k LOD. That is, the robust LOD score is proportional to the LOD score under model misspecification. So the problem simplifies to the search for the constant of proportionality k.
Blangero et al. [3] proposes a method based on simulation to estimate this constant of proportionality. The idea is to generate a sample of the distribution of LOD scores for the nonnormal trait under the null hypothesis and a sample of the asymptotic expected distribution of LOD scores for a normal trait. Because these two distributions should be proportional, an estimator of the constant of proportionality k can be obtained from a simple regression. To calculate the LOD scores for the nonnormal trait, a set of independent random markers are simulated for every subject in the study. We call these LOD scores the observed LOD scores. On the other hand, the expected LOD scores are sampled from the asymptotic distribution of the test under normality.
The benefit of this approach over direct use of the empirical distribution of the LOD scores is that it results in a LOD statistic whose interpretation remains intact. Also, the calculation of k requires fewer replicates than the empirical LOD score distribution for small pvalues.
Our aim is to find a similar smoothing correction for the bivariate phenotype. The Collaborative Study on the Genetics of Alcoholism (COGA) data contains two quantitative phenotypes that are clearly nonnormally distributed: MXDRNK and CIGPKY. Any bivariate linkage analysis containing either of these phenotypes will provide biased LOD scores.
In this work we investigate four questions: First, can we find a robust LOD score approximately proportional to the LOD score under model misspecification with the bivariate phenotype? Second, will the proportionality be between the two degrees of freedom (2df) LOD scores or between the equivalent one degree of freedom (1df) LOD scores? Third, what is the lower boundary of the number of replicates required to achieve a good approximation of the slope? Lastly, which smoothing constant is most appropriate for use with the phenotypes of the COGA data?
Methods
Data
The COGA is a multicenter research project to detect and map susceptibility genes for alcohol dependence and related phenotypes [5]. We worked with the COGA sample data available for the Genetic Analysis Workshop 14 (GAW14), consisting of 143 extended families with 1,350 family members with clinical and demographic data. To avoid the problems of a mixed population of different ethnicities, we only used the subset of White nonHispanic individuals (1,074 individuals in 119 families). This set of data contains 15 quantitative phenotypes. Two of them are behavioral measures of psychiatric interest: the "maximum number of drinks consumed in a 24 hours period" (MXDRNK) is related with alcoholism diagnosis and provides a quantitative measure to grade alcoholic and nonalcoholic individuals; the "number of packs of cigarettes per day for one year" (CIGPKY), is highly correlated with alcohol consumption. The other 13 quantitative phenotypes are electrophysiological traits that measure the neuroelectric activity generated in response to stimulus. Electrodes attached to the scalp of an individual with conductive gel record the eventrelated potentials (ERP). Various spatial and temporal characteristics differentiate the different ERPs. View Begleiter et al. [5] for a detailed description of these phenotypes. We used sex and age as covariates in all the bivariate models of this study.
Transformation from a 2df LOD score to a 1df LOD score
Every LOD score has a direct match with a pvalue. However, because the log likelihood ratio (LR) test where the LOD score comes from has a different distribution for univariate than for the bivariate phenotypes, these LOD scores will have different interpretations. The LR test in the univariate case follows a 1/2:1/2 mixture of a chi square distribution with 1 degree of freedom and a point mass at zero [6]. On the other hand, the null hypothesis in the bivariate case involves constraining to zero three parameters: the genetic correlation due to the quantitative trait locus (QTL) for both traits, and the genetic correlation at that QTL. Thus, the LR test for the bivariate case follows a 1/4:1/2:1/4 mixture of chi squares with 3 and 1 degrees of freedom and a point mass at zero, respectively [7]. We call the former a 1degree of freedom LOD score (1df LOD) and the later a 2degrees of freedom LOD score (2df LOD). To obtain the 1df LOD equivalent to a given 2df LOD, we follow two steps: first, we calculate the pvalue corresponding to the 2df LOD; and second, we calculate the 1df LOD corresponding to that pvalue.
The LODADJ method
The SOLAR command "lodadj" implements the simulation method described in the "Background" section. We used this command with 27 bivariate phenotypes form the COGA data (all the possible pairs that include MXDRNK or CIGPKY) to generate repeated unlinked markers, calculate their respective LOD scores, and estimate the correction constant. We chose the pair of phenotypes MXDRNK and ttth3 (one of the electrophysiological measures) as the bivariate phenotype to perform the main simulation of linkage with unlinked markers, in which we ran 100,000 replicates. This experiment, with a huge amount of simulated markers, will serve to test the utility of the proposed method. At the same time, we calculated the slope constant for the 1df LOD scores obtained from the transformation of the original 2df LOD scores.
For the 26 remaining pairs of phenotypes, we calculated the smoothing slope from only 1,000 simulated markers, based on the results of the slope sampling experiment described below. Moreover, in these cases we only give the smoothing slope calculated from the 2df LOD score.
Slope sampling
Given the 100,000 observed LOD scores for the MXDRNKttth3 phenotype under the null model, we selected 100 random samples of varying size n (100, 250, 500, 750, 1,000, 2,000, 3,000, 4,000, 5,000 and 10,000) and calculated the regression slope between the observed and the expected LOD scores for each sample. Thus, we obtained a pool of 100 slopes for each value of n, and we calculated the mean and standard deviation of the slopes for each pool.
Software
We used SOLAR [8, 9] for simulation and linkage analyses. We then used the package R [10] to perform statistical analyses and to draw the plots.
Results
The 13 electrophysiological measures are approximately normally distributed, with kurtosis ranging from 0.2 to 1. However, both MXDRNK and CIGPKY present clear nonnormal distributions with high kurtosis: 6.3 and 8.4, respectively.
Figure 1 shows a summary in four plots of the LOD adjustment results for the MXDRNKttth3 phenotype after the simulation of 100,000 unlinked markers. The plots in the first column are derived from the 2df LOD scores; those in the second column are derived from the 1df LOD scores. In the plots in the first row we show the regression of the observed LOD scores on the expected ones. In the plots in the second row we show the empirical cumulative distributions of the observed, expected and adjusted LOD scores.
The slopes are 0.78 and 0.73 for the 2df LOD and 1df LOD score, respectively. Thus, both regressions provide a similar fit. In both cases, we can detect a bias, the high values are over the mean, and the low values are under the mean. The plots of the cumulative distributions show also similar results for the 2df and 1df cases.
Table 1 summarizes the type I error rate for the observed and adjusted LOD scores for both the 1df and the 2df LOD score.
Figure 2 shows the means (on the left) and standard deviations (on the right) of the 100 slopes obtained for each value of n. The means stabilize quickly and the standard deviation decreases as the sample size increases. The standard deviation appears to become asymptotic around a sample size of 3,000 replicates but it is quite small with 1,000 replicates.
We chose this value (n = 1,000) to calculate the slope correction for all 27 bivariate phenotypes pairs of quantitative phenotypes of the COGA data containing MXDRNK or CYGPKY. Table 2 shows the smoothing correction constant for the different phenotype pairs and the corresponding 1df LOD score equivalent to a LOD score of 3 in the case of normality.
Discussion
Variance components linkage analysis is a powerful and flexible approach for the analysis of the genetic components of quantitative traits. Its main drawback is the increase of type I error when the traits of interest are not normally distributed. As in the univariate case, a smoothing correction for the LOD score is necessary in the bivariate case. In this work we have presented an empirical approach showing that the 2df observed LOD score is approximately proportional to a more robust 2df LOD score. It is important to note that the observed proportionality holds also for the equivalent 1df LOD score. This fact is not surprising because the 2df LOD and the equivalent 1df LOD are nearly proportional. Thus, the correction for deviations from normality could be performed before or after the transformation to the 1df LOD score. The smoothing approach presented here appears to work quite well, although it may be too conservative, as shown in Table 1.
Bivariate linkage analysis is computationally intensive and a simulation with 100,000 replicates is sometimes unfeasible. Our slope sampling experiment, however, indicates that 1,000 to 3,000 replicates should be enough to achieve a good estimation of the slope.
We calculated the correction slope for the MXDRNK and the CIGPKY phenotypes combined with the 13 electrophysiological measures of the COGA data. The range of the smoothing constants varies between 0.69 and 0.94.
If we consider a significant 1df LOD score as being at least 3 (2df LOD = 3.99), a slope correction of 0.8 requires a 2df LOD = 4.95, equivalent 1df LOD = 3.87 to be significant. Table 2 shows the 1df LOD score required for each pair of phenotypes to achieve a significance equivalent to a 1df LOD of 3 with two normally distributed phenotypes. This finding must be considered when using bivariate linkage analysis with this data.
Conclusion
Variance component models for bivariate linkage analysis with at least one nonnormally distributed phenotype give inflated type I error and then inflated LOD scores. A smoothing correction similar to the one available for univariate linkage analysis could be used to achieve a more accurate type I error and, thus, more reliable LOD scores.
The smoothing correction gives similar results when performed on the 2df LOD scores or on the equivalent transformed 1df LOD scores.
Compared with the direct use of the empirical distribution of the LOD scores, the calculation of the correction slope k proposed here requires fewer replicates for small pvalues. Between 1,000 and 3,000 replicates are enough to obtain a reliable correction.
More studies are needed to evaluate the impact of this correction on the power to find linkage with a nonnormally distributed bivariate phenotype.
Abbreviations
 CIGPKY:

Number of packs of cigarettes per day for one year
 COGA:

Collaborative Study on the Genetics of Alcoholism
 ERP:

Eventrelated potential
 GAW14:

Genetic Analysis Workshop 14
 LR:

Likelihood ratio
 MXDRNK:

Maximum number of drinks consumed in a 24 hours period
 QTL:

Quantitative trait locus
References
 1.
Allison DB, Neale MC, Zannolli R, Schork NJ, Amos CI, Blangero J: Testing the robustness of the likelihoodratio test in a variancecomponent quantitativetrait locimapping procedure. Am J Hum Genet. 1999, 65: 531544. 10.1086/302487.
 2.
Blangero J, Williams JT, Almasy L: Robust LOD scores for variance componentbased linkage analysis. Genet Epidemiol. 2000, 19 (Suppl 1): S8S14. 10.1002/10982272(2000)19:1+<::AIDGEPI2>3.0.CO;2Y.
 3.
Blangero J, Williams JT, Almasy L: Variance component methods for detecting complex trait loci. Adv Genet. 2001, 42: 151181.
 4.
Foutz RV, Srivastava RC: The performance of the likelihood ratio test when the model is incorrect. Ann Stat. 1977, 5: 11831194.
 5.
Edenberg HJ, Bierut LJ, Boyce P, Cao M, Cawley S, Chiles R, Doheny KF, Hansen M, Hinrichs T, Jones K, Kelleher M, Kennedy GC, Liu G, Marcus G, McBride B, Murray SS, Oliphant A, Prettngill J, Porjesz B, Pugh EW, Rice JP, Rubano T, Shannon S, Steeke R, Tischfield JA, Tsai YY, Zhang C, Begleiter H: Description of the data from the Collaborative Study on the Genetics of Alcoholism (COGA) and singlenucleotide polymorphism genotyping for Genetic Analysis Workshop 14. BMC Genet.
 6.
Self SG, Liang KY: Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J Am Stat Assoc. 1987, 82: 605610. 10.2307/2289471.
 7.
Amos C, de Andrade M, Zhu D: Comparison of multivariate tests for genetic linkage. Hum Hered. 2001, 51: 133144. 10.1159/000053334.
 8.
Almasy L, Blangero J: Multipoint quantitativetrait linkage analysis in general pedigrees. Am J Hum Genet. 1998, 62: 11981211. 10.1086/301844.
 9.
SOLAR: Sequential Oligogenic Linkage Analysis Routines. [http://www.sfbr.org/solar]
 10.
The R Project for Statistical Computing. [http://www.rproject.org]
Acknowledgements
This work was supported in part by NIH grants MH59490 and HL70751. A. Buil was supported by the FIS 01/A046 from the Fondo de Investigación Sanitaria (Spanish Ministry of Health). The authors thank the organizers of GAW14 for a travel scholarship to attend the conference for A. Buil.
Author information
Additional information
Authors' contributions
AB carried out the simulations and statistical analyses and drafted the manuscript. TDD participated in the design and implementations of the simulations. LA participated in the design of the study and performed a critical revision of the manuscript. JB conceived the study and participated in its design and coordination. All authors read and approved the final manuscript.
Rights and permissions
About this article
Cite this article
Buil, A., Dyer, T.D., Almasy, L. et al. Smoothing of the bivariate LOD score for nonnormal quantitative traits. BMC Genet 6, S111 (2005) doi:10.1186/147121566S1S111
Published
DOI
Keywords
 Genetic Analysis Workshop
 Quantitative Phenotype
 Unlinked Marker
 Variance Component Linkage Analysis
 Bivariate Linkage Analysis