Mixed-effects Cox models of alcohol dependence in extended families

The presence of disease is commonly used in genetic studies; however, the time to onset often provides additional information. To apply the popular Cox model for such data, it is desirable to consider the familial correlation, which involves kinship or identity by descent (IBD) information between family members. Recently, such a framework has been developed and implemented in a UNIX-based S-PLUS package called kinship, extending the Cox model with mixed effects and familial relationship. The model is of great potential in joint analysis of family data with genetic and environmental factors. We apply this framework to data from the Collaborative Study on the Genetics of Alcoholism data as part of Genetic Analysis Workshop 14. We use the S-PLUS package, ported into the R environment , for the analysis of microsatellite data on chromosomes 4 and 7. In these analyses, IBD information at those markers is used in addition to the basic Cox model with mixed effects, which provides estimates of the relative contribution of specific genetic markers. D4S1645 had the largest variance and contribution to the log-likelihood on chromosome 4, but the significance of this finding requires further investigation.


Background
While most genetic data analyses focus on disease events, ages of disease onset contain valuable information. Software for such analyses has been limited, and include the AGEON module in SAGE, the LINKAGE package that allows age classes with different penetrances in these classes, and the computer program LIPED that allows a log-normal/straight-line distribution. The Cox model for survival analysis is well established and has been extended to include random effects. When applied to family data, it is necessary to account for familial correlation and to include the identity-by-descent (IBD) information. Among models recently proposed with these component(s) [1][2][3][4], the framework by Therneau [2] appears to be the most comprehensive. Building on the established module survival in S-PLUS, a new UNIX package kinship has been developed, which contains, among others, a function called coxme that includes components of the Cox model, random effects, and familial relationship.
Earlier reports on candidate genes for alcohol dependence [5] showed loci on several chromosomes including 4 and 7, but were limited in the use of age-of-onset information and confounding factors. Here, the new modelling framework is explored with Genetic Analysis Workshop 14 (GAW14) problem 1 data using microsatellite markers on chromosomes 4 and 7. which is an extension of the standard Cox model for event-times to allow for random effects and the kinship or IBD information in families.

The mathematical model
Following Ripatti and Palmgren [6], let T i denote the event time for unit i, i = 1, ..., n, C i the censoring time, U i = min(T i ,C i ), and δ i = I(T i ≤ C i ), the basic Cox model with vector of explanatory variables X i is specified via a hazard function λ i (t) = λ 0 (t)exp(X i β).
The model can be extended to include random effects or , with θ being a vector of unknown parameters.
The likelihood function is similar in form to that of the partial likelihood of the standard Cox model. If the censoring is independent and non-informative of b, the likelihood function L in terms of (λ 0 (t),β,θ) can be obtained by integrating over b. Assuming b ~ MVN(b,D(θ)) and defining as usual, we have Next, a Laplace approximation is applied to obtain an approximate marginal log-likelihood that can be maximized by a penalized fixed-effects partial likelihood for parameters (β(θ),b(θ)) and in turn used in a profile likelihood function involving only θ [6]. Here the parameters of interest θ are the variances involving kinship ( ) and IBD ( ) matrices, such that D(θ) is a linear combination of θ and these matrices. This is reminiscent of a similar approach found in SAS PROC MIXED, and the specific relationship between the Newton-Raphson iteration of a Cox model and a linear mixed-effects model has been elaborated elsewhere [7]. Further mathematical details, including the integrated and penalized likelihood and their degree(s) of freedom, together with an S-PLUS package for UNIX called kinship by Terry Therneau are available from http://www.mayo.edu/hsr/Sfunc.html. The specific function coxme generalizes coxph in the widely used S-PLUS package survival by the same author. Due to some difficulties with the S-PLUS package, it was ported into the R system http://www.r-project.org. It takes advantage of the recent developments in S3/S4 classes [8] and is freely available.

The data
The GAW14 problem 1 contains data from the Collaborative Study on the Genetics of Alcoholism (COGA), including a large number of pedigrees, microsatellite markers, and SNPs. A number of phenotypes and covariates are also available. The alcoholism diagnoses were based on DSM-III-R, Feighner, and DSM-IV criteria with information on ages of onset.
All 143 families in the GAW14 COGA data were used. These families had an average of 11 members (range 5 to 32) and the total sample size was 1,614 (826 men, 788 women). Two variables represented definitions of alcoholism: ALDX1 according to DSM-III-R + Feighner definition, and ALDX2 based on DSM-IV. ALDX1 and ALDX2 largely agree but ALDX1 is less stringent, with 103 "unaffected with some symptoms" under ALDX2 classified as affected under ALDX1. According to ALDX1, there were an average of 3 affected members (range 1 to 14) in these families. In the following analysis, ALDX1 will be used. Possible confounding variables include sex, ethnicity, and smoking. We did not adjust for ethnicity in the analysis. The raw data were extracted and analyzed using C programs, SAS, and STATA, while IBD information was generated from SOLAR. Allele frequencies were rescaled when they did not sum to 1 exactly.

The analysis
The analysis was limited to microsatellite markers on chromosomes 4 and 7 based on prior publications [5]. For comparison, we also used STATA, which allows for frailty with a gamma distribution and two-level analysis. Often IBD matrices from SOLAR were taken by kinship to be non-positive definite (possibly due to rounding errors); therefore, we perturbed the IBD matrices with an identity matrix with small variance (0.01). Terry Therneau has indicated that this was not the case with SIMWALK2. However, SOLAR was unable to read outputs from SIMWALK2.89. Because kinship uses SOLAR output by default, the SIMWALK 2.89 results were not used. GENE-HUNTER was used to obtain marker informativeness. Because each analysis can be viewed as multiple tests to identify a susceptibility gene on the two chromosomes, we computed false-discovery rates using SAS PROC MULTTEST.

Results
There were 266 individuals with no information and 643 (436 men, 207 women) with alcohol dependence, the mean (SD) ages of onset being 22.8 (9.2) and 21.6 (8.9). More men than women were alcohol dependent and smokers were more likely to be alcohol dependent. The default gamma frailty model of mean one in STATA with sex gave variance (SE) of 0.063 (0.034) and χ 2 = 5.11, p =

Discussion
The framework of Therneau [2] can integrate information from several sources: the affection status and age of onset, the familial relationship, genetic information, and covariates. The Cox model is familiar to researchers and accommodates counting process [start, stop] notation and timedependent covariates, alternative time scales, and multiple events/subject data. Gaussian random effects allow for efficient analysis of large genetic correlations. Genetic markers can also be used via a function called lmekin (linear mixed models with kinship) for the analysis of quantitative traits in a way similar to SOLAR. The extended Cox model and the linear mixed model are flexible in assessing the relative magnitude of genetic and environmental influences, including multiple genes. Additional functions include kinship calculation, use of external IBD matrices and sparse matrices to analyze large, extended families, and pedigree-drawing for a single or a set of pedigrees. The package is more comprehensive than SAS and STATA in dealing with frailty, and the port to R makes it freely available to many platforms and represents a growing trend of integration of statistical genetics into the mainstream statistical computing. For instance, some data preparations done by C programs turned out to be much easier with the function read.fortran as available from R 2.0.0. The model is a natural generalization of the longitudinal model incorporating subject-specific random effects as given in [9]. It is also a hybrid between the model for quantitative trait locus linkage using times, and the model for discrete trait linkage using events. Unlike the parametric (LOD score) and nonparametric linkage analysis, which are well established [10], the Cox model approach is relatively new and requires further development.
D4S1645 appears to be promising but requires further investigation, e.g., the use of IBD information from SIMWALK2 and the exploration of the rapid analysis of many markers. It is possible to use information on the initiation of alcohol drinking and environmental factors such as smoking or social class when available. Ideally, tightly linked markers can be used in the models for association analysis and comparison made with other models [1,3,4].

Abbreviations
COGA: Collaborative Study on the Genetics of Alcoholism GAW14: Genetic Analysis Workshop 14 IBD: Identity by descent