 Proceedings
 Open Access
 Published:
Mixedeffects Cox models of alcohol dependence in extended families
BMC Genetics volume 6, Article number: S127 (2005)
Abstract
The presence of disease is commonly used in genetic studies; however, the time to onset often provides additional information. To apply the popular Cox model for such data, it is desirable to consider the familial correlation, which involves kinship or identity by descent (IBD) information between family members. Recently, such a framework has been developed and implemented in a UNIXbased SPLUS package called kinship, extending the Cox model with mixed effects and familial relationship. The model is of great potential in joint analysis of family data with genetic and environmental factors. We apply this framework to data from the Collaborative Study on the Genetics of Alcoholism data as part of Genetic Analysis Workshop 14. We use the SPLUS package, ported into the R environment http://www.rproject.org, for the analysis of microsatellite data on chromosomes 4 and 7. In these analyses, IBD information at those markers is used in addition to the basic Cox model with mixed effects, which provides estimates of the relative contribution of specific genetic markers. D4S1645 had the largest variance and contribution to the loglikelihood on chromosome 4, but the significance of this finding requires further investigation.
Background
While most genetic data analyses focus on disease events, ages of disease onset contain valuable information. Software for such analyses has been limited, and include the AGEON module in SAGE, the LINKAGE package that allows age classes with different penetrances in these classes, and the computer program LIPED that allows a lognormal/straightline distribution. The Cox model for survival analysis is well established and has been extended to include random effects. When applied to family data, it is necessary to account for familial correlation and to include the identitybydescent (IBD) information. Among models recently proposed with these component(s) [1–4], the framework by Therneau [2] appears to be the most comprehensive. Building on the established module survival in SPLUS, a new UNIX package kinship has been developed, which contains, among others, a function called coxme that includes components of the Cox model, random effects, and familial relationship.
Earlier reports on candidate genes for alcohol dependence [5] showed loci on several chromosomes including 4 and 7, but were limited in the use of ageofonset information and confounding factors. Here, the new modelling framework is explored with Genetic Analysis Workshop 14 (GAW14) problem 1 data using microsatellite markers on chromosomes 4 and 7.
Methods
The mathematical model
The mathematical model can be sketched as
Cox model + random effects = kinship/IBD information,
which is an extension of the standard Cox model for eventtimes to allow for random effects and the kinship or IBD information in families.
Following Ripatti and Palmgren [6], let T_{ i }denote the event time for unit i, i = 1, ..., n, C_{ i }the censoring time, U_{ i }= min(T_{ i },C_{ i }), and δ_{ i }= I(T_{ i }≤ C_{ i }), the basic Cox model with vector of explanatory variables X_{ i }is specified via a hazard function λ_{ i }(t) = λ_{0}(t)exp(X_{ i }β).
The model can be extended to include random effects or frailties Z_{ i }, such that λ_{ i }(tb) = λ_{0}(t)exp(X_{ i }β + Z_{ i }b), b ~ p(b;D(θ)), with θ being a vector of unknown parameters. The likelihood function is similar in form to that of the partial likelihood of the standard Cox model. If the censoring is independent and noninformative of b, the likelihood function L in terms of (λ_{ 0 }(t),β,θ) can be obtained by integrating over b. Assuming b ~ MVN(b,D(θ)) and defining as usual, we have
Next, a Laplace approximation is applied to obtain an approximate marginal loglikelihood that can be maximized by a penalized fixedeffects partial likelihood for parameters (β(θ),b(θ)) and in turn used in a profile likelihood function involving only θ [6]. Here the parameters of interest θ are the variances involving kinship () and IBD () matrices, such that D(θ) is a linear combination of θ and these matrices. This is reminiscent of a similar approach found in SAS PROC MIXED, and the specific relationship between the NewtonRaphson iteration of a Cox model and a linear mixedeffects model has been elaborated elsewhere [7]. Further mathematical details, including the integrated and penalized likelihood and their degree(s) of freedom, together with an SPLUS package for UNIX called kinship by Terry Therneau are available from http://www.mayo.edu/hsr/Sfunc.html. The specific function coxme generalizes coxph in the widely used SPLUS package survival by the same author. Due to some difficulties with the SPLUS package, it was ported into the R system http://www.rproject.org. It takes advantage of the recent developments in S3/S4 classes [8] and is freely available.
The data
The GAW14 problem 1 contains data from the Collaborative Study on the Genetics of Alcoholism (COGA), including a large number of pedigrees, microsatellite markers, and SNPs. A number of phenotypes and covariates are also available. The alcoholism diagnoses were based on DSMIIIR, Feighner, and DSMIV criteria with information on ages of onset.
All 143 families in the GAW14 COGA data were used. These families had an average of 11 members (range 5 to 32) and the total sample size was 1,614 (826 men, 788 women). Two variables represented definitions of alcoholism: ALDX1 according to DSMIIIR + Feighner definition, and ALDX2 based on DSMIV. ALDX1 and ALDX2 largely agree but ALDX1 is less stringent, with 103 "unaffected with some symptoms" under ALDX2 classified as affected under ALDX1. According to ALDX1, there were an average of 3 affected members (range 1 to 14) in these families. In the following analysis, ALDX1 will be used. Possible confounding variables include sex, ethnicity, and smoking. We did not adjust for ethnicity in the analysis. The raw data were extracted and analyzed using C programs, SAS, and STATA, while IBD information was generated from SOLAR. Allele frequencies were rescaled when they did not sum to 1 exactly.
The analysis
The analysis was limited to microsatellite markers on chromosomes 4 and 7 based on prior publications [5]. For comparison, we also used STATA, which allows for frailty with a gamma distribution and twolevel analysis. Often IBD matrices from SOLAR were taken by kinship to be nonpositive definite (possibly due to rounding errors); therefore, we perturbed the IBD matrices with an identity matrix with small variance (0.01). Terry Therneau has indicated that this was not the case with SIMWALK2. However, SOLAR was unable to read outputs from SIMWALK2.89. Because kinship uses SOLAR output by default, the SIMWALK 2.89 results were not used. GENEHUNTER was used to obtain marker informativeness. Because each analysis can be viewed as multiple tests to identify a susceptibility gene on the two chromosomes, we computed falsediscovery rates using SAS PROC MULTTEST.
Results
There were 266 individuals with no information and 643 (436 men, 207 women) with alcohol dependence, the mean (SD) ages of onset being 22.8 (9.2) and 21.6 (8.9). More men than women were alcohol dependent and smokers were more likely to be alcohol dependent. The default gamma frailty model of mean one in STATA with sex gave variance (SE) of 0.063 (0.034) and χ^{2} = 5.11, p = 0.012 according to a 50:50 mixture of and , comparable to a similar model from coxph with a variance estimate of 0.065, p < 0.0001. Simple random effects model of family membership by coxme gave a variance estimate of 0.078.
However, it is more appropriate to use a correlated frailty model with the kinship matrix. This led to a variance estimate of 0.22 and integrated = 5.49 with p = 0.019 compared to the loglikelihood 3534.66 without using kinship information. D4S1645 (near GABRB1) gave the largest contribution to the loglikelihood and highest variance (0.221) on chromosome 4. In comparison, D7S509 had the largest variance (0.238) and largest contribution to the loglikelihood on chromosome 7. The falsediscovery rates were 0.044 and 0.030, respectively. When fixing the variance associated with the kinship matrix at the value of 0.22, D4S1645 showed the biggest difference in likelihoods. Due possibly to the similarity between structures of the kinship and IBD for random effects, most microsatellite markers and kinship relationship seemed to be strongly correlated, except markers D4S1645, D7S629, and D7S1830. There did not appear to be a link with marker informativeness (information contents of 58.9%, 53.6%, 61.6% according to GENEHUNTER). These results are shown in Table 1, where the χ^{2} for the integrated loglikelihood is listed. Similar results were obtained when including sex as a covariate (data not shown), with the regression coefficient (SE) across loci in the range of 0.21~0.23 (0.09~0.10) and pvalue in the range of 0.017~0.038; however, the modelfitting was greatly improved, indicating the importance of covariates in characterizing age of onset. The falsediscovery rates were 0.065 ( = 0.22) and 0.093 ( and are free), respectively.
Discussion
The framework of Therneau [2] can integrate information from several sources: the affection status and age of onset, the familial relationship, genetic information, and covariates. The Cox model is familiar to researchers and accommodates counting process [start, stop] notation and timedependent covariates, alternative time scales, and multiple events/subject data. Gaussian random effects allow for efficient analysis of large genetic correlations. Genetic markers can also be used via a function called lmekin (linear mixed models with kinship) for the analysis of quantitative traits in a way similar to SOLAR. The extended Cox model and the linear mixed model are flexible in assessing the relative magnitude of genetic and environmental influences, including multiple genes. Additional functions include kinship calculation, use of external IBD matrices and sparse matrices to analyze large, extended families, and pedigreedrawing for a single or a set of pedigrees. The package is more comprehensive than SAS and STATA in dealing with frailty, and the port to R makes it freely available to many platforms and represents a growing trend of integration of statistical genetics into the mainstream statistical computing. For instance, some data preparations done by C programs turned out to be much easier with the function read.fortran as available from R 2.0.0. The model is a natural generalization of the longitudinal model incorporating subjectspecific random effects as given in [9]. It is also a hybrid between the model for quantitative trait locus linkage using times, and the model for discrete trait linkage using events. Unlike the parametric (LOD score) and nonparametric linkage analysis, which are well established [10], the Cox model approach is relatively new and requires further development.
D4S1645 appears to be promising but requires further investigation, e.g., the use of IBD information from SIMWALK2 and the exploration of the rapid analysis of many markers. It is possible to use information on the initiation of alcohol drinking and environmental factors such as smoking or social class when available. Ideally, tightly linked markers can be used in the models for association analysis and comparison made with other models [1, 3, 4].
Abbreviations
 COGA:

Collaborative Study on the Genetics of Alcoholism
 GAW14:

Genetic Analysis Workshop 14
 IBD:

Identity by descent
References
 1.
Li H, Zhong X: Multivariate survival models induced by genetic frailties, with application to linkage analysis. Biostatistics. 2002, 3: 5775. 10.1093/biostatistics/3.1.57.
 2.
Therneau TM: On mixedeffect Cox models, sparse matrices, and modeling data from large pedigrees. [http://mayoresearch.mayo.edu/mayo/research/biostat/upload/kinship.pdf]
 3.
Zhong X, Li H: Score tests of genetic association in the presence of linkage based on the additive genetic gamma frailty model. Biostatistics. 2004, 5: 307327. 10.1093/biostatistics/5.2.307.
 4.
Li YJ, Martin ER, Zhang L, Allen AS: Application of a rankbased genetic association test to ageatonset data from the Collaborative Study on the Genetics of Alcoholism study. BMC Genet. 2005, 6 (Suppl 1): S5310.1186/147121566S1S53.
 5.
Reich T, Hinrichs A, Culverhouse R, Bierut L: Psychiatric genetics '99: genetic studies of alcoholism and substance dependence. Am J Hum Genet. 1999, 65: 599605. 10.1086/302561.
 6.
Ripatti S, Palmgren J: Estimation of multivariate frailty models using penalized partial likelihood. Biometrics. 2000, 56: 10161022. 10.1111/j.0006341X.2000.01016.x.
 7.
Yau KKW, McGilchrist CA: Use of generalised linear mixed models for the analysis of clustered survival data. Biometrical J. 1997, 39: 311. 10.1002/bimj.4710390102.
 8.
Chamers J: Programming with Data. 1998, New York: Springer
 9.
Guo X, Carlin BP: Separate and joint modeling of longitudinal and event time data using standard computer packages. Am Stat. 2004, 58: 1624.
 10.
Curtis D, Zhao JH, Sham PC: Comparison of GENEHUNTER and MFLINK for analysis of COGA linkage data. Genet Epidemiol. 1999, 17 (Suppl 1): S115S120.
Acknowledgements
The author wishes to thank Drs. Thomas Dyer and Terry Therneau for their help and guidance on SOLAR and kinship. Thanks also to Prof. John Rice, two anonymous reviewers, and Dr. Qihua Tan for helpful comments. This work is partly supported by the NIA grant AG13196.
Author information
Rights and permissions
About this article
Cite this article
Zhao, J.h. Mixedeffects Cox models of alcohol dependence in extended families. BMC Genet 6, S127 (2005) doi:10.1186/147121566S1S127
Published
DOI
Keywords
 Frailty Model
 Genetic Analysis Workshop
 Familial Correlation
 Quantitative Trait Locus Linkage
 Gamma Frailty Model