Linkage analysis of longitudinal data
BMC Geneticsvolume 4, Article number: S27 (2003)
We propose a statistical model for linkage analysis of the longitudinal data. The proposed model is a mixed model based on the new Haseman and Elston model and allows several random effects. Specifically, the proposed model includes a random effect for correlation among sib pairs having one sibling in common, and one for the correlation among siblings from the same parents.
The proposed model was applied to the analysis of the Genetic Analysis Workshop 13 simulated data set for a quantitative trait of the systolic blood pressure. A simple independence model and two kinds of random effects models yielded good power for detecting linkage for these data sets, while the random effects models performed slightly better than the independence model. Both random effects models showed similar performance.
The proposed models seem not only quite useful in detecting linkage with the longitudinal data for the trait but also quite flexible. They can handle a wide class of correlation structures. Models with a more general class of covariance structure are desirable.
We explore the Genetic Analysis Workshop (GAW13) simulated data set, which contains longitudinal data for two cohorts drawn from 330 pedigrees containing 4692 individuals, with data collection on each cohort starting about 30 years apart. The first cohort was examined 21 times at two-year intervals. The second cohort was examined five times at four-year intervals with eight years between the first two examinations. With knowledge of the answers, we test linkage to identify those markers linked to genes for the quantitative trait of the blood pressure (BP). We found that the trait systolic blood pressure (SBP) is affected by several quantitative trait loci and nongenetic factors such as gender, age, total cholesterol, smoking, fasting glucose, hypertension treatment, and weight.
For detecting linkage, Haseman and Elston  proposed the nonparametric linkage method for a quantitative trait. This procedure involves simple regression of the squared difference of sib pair trait identity on the proportion of alleles shared IBD (identical by descent) at genetic markers. In a method developed later by Elston et al. , the mean-corrected cross-product of the trait replaces the measure's squared difference. This implementation is proposed as a method to get rid of possible correlation between observations when a family in the sample consists of more than two offspring. For better understanding and better power, we require a statistical analysis that allows us to examine multiple genes at the same time. In this regard, the method extends to multiple regressions for detecting linkage at several loci that determine the traits.
Longitudinal data arise when an outcome variable of interest is measured repeatedly over time from the same subject. Repeated observations from the same individual are usually correlated. To account for correlation in the analysis, mixed models are commonly used to analyze longitudinal data. Linear mixed models with random subject effects were proposed by Laird and Ware . Jennrich and Schluchter proposed a more general class of models with structured covariances . Liang and Zeger proposed a model based on the generalized estimating equation (GEE) that can handle both normally and non-normally distributed outcomes . Though the GEE approach can be used for normally distributed outcomes, it is shown to be less efficient than the maximum likelihood approach . Mixed models usually assume a special form of covariance structure and use maximum likelihood or restricted maximum likelihood estimation to obtain the estimators of model parameters. Iterative algorithms for parameter estimation are generally required.
In this study, we propose a mixed model for linkage analysis of the longitudinal data. Our model basically has the same form of the new Haseman and Elston model . To incorporate the interrelation among correlated observations, it uses the same correlation structures of ordinary mixed models. In the model, we specifically consider a random effect for correlation among sib pairs having one sib in common, and one for the correlation among siblings from the same parents. We believe that the proposed model is easy to apply and can handle a wide class of correlation structures. To identify linkage by using the proposed model, we consider the genes closest to b34, b35, b36, s10, s11, and s12 as candidate marker loci, since we know that SBP is affected by genes of b34, b35, b36, s10, s11, and s12. Also we select five markers of b5, b14, b16, b18, and b21, which are taken from different chromosomes.
We performed linkage analysis on the quantitative trait SBP* (SBP adjusted for gender, age, total cholesterol, smoking, fasting glucose, hypertension treatment, weight, and high blood pressure) from Cohorts 1 and 2. SBP* was determined in part by b34, b35, b36, s10, s11, and s12. We found the results for the mean-corrected cross-product of SBP*, henceforth refer to as C(SBP*) (see equation (2) in Methods) by using three different mixed models. We tested H0: β k (or γ l ) ≤ 0 vs. H A : β k (or γ l ) > 0 for the linkage data set. If T ≥ 2.14 (i.e., lod score ≥ 1.0), the β k (or γ l ) was considered as in the model, where k = 1, ..., 6 and l = 1, ..., 5.
First, we selected at random one replicate (replicate 43, consisting of the 99,714 observations from n = 2772 sib pairs) out of 100 replicates and examined linkage. To obtain better outcomes, we also analyzed a larger sample created by combining two replicates (replicate 43 and 47, randomly selected) including the 199,536 observations from n = 5512 sib pairs. In Table 1, we report the results of independence model (Model 1) and random effects models (Model 2 and 3). We found that three different approaches on a single sample were basically similar to detecting linkage. Most of the variables I k (k = 1, ..., 6), which denotes the number of alleles IBD at marker locus closest to genes determining SBP, were significantly detected by an independence model (Model 1) using two replicates combined. For U l (l = 1, ..., 5) which is the number of alleles IBD at genes closest to five unlinked markers, all variables were not significant using random effects models (Model 2 and 3) with two replicates combined.
We then performed linkage in each of all 100 replicates, respectively. Each sample was derived from around n = 99,300 observations from about n = 2747 sib pairs. As shown in Table 2, we analyzed power for C(SBP*) in each of three different models. As can be seen in the table, the power was generally high for most of the variables I k (k = 1, ..., 6) and tended to increase as random effects were added in the model. Under Model 3, the corresponding power was the highest in 50% of the variables I k (k = 1, ..., 6) among three models.
For the GAW13 simulated data on SBP*, we conclude that the random effects models (Model 2 and 3) seems to work slightly better than the independence model (Model 1) to identify linkage while considering all candidate markers at the same time. Both random effects models showed similar performance in detecting linkage for these data.
The models for longitudinal data mainly focus on how to handle the correlations among the repeated measurements. Appropriate random effects can summarize correlations effectively. The time effects can be easily treated as one covariate of interest in the model. The main focus of the proposed model is allowing for appropriate random effects for the correlated sib pairs in the Haseman-Elston model . The correlation may be caused by a common sibling or by a common parent. Also, it can be caused by the repeated observation for the same sib pair at different observation times. The proposed model can include corresponding random effects easily. It can handle a wide class of correlation structures.
If we were interested in the inference for the time effect, then the first-stage model need not include the time effect but the second-stage model should. Since we worked with a simulated data set, we mainly focused on comparing the independence model with random-effects models.
In our analysis, we used SAS to analyze the mixed model for longitudinal data. For a sib pair linkage analysis, a C program was implemented. We have not applied any standard quantitative trait loci (QTL) software yet because we are not sure whether it can handle the proposed model. Certainly, it might be interesting to investigate further.
We are planning to do linkage analysis by combining more replicates. We expect that the proposed models perform much better in detecting linkage for larger samples with more replicates.
At the first stage of model fitting, we adjusted SBP by known effective nongenetic factors of gender, age, total cholesterol, smoking, fasting glucose, hypertension treatment, and weight, and high blood pressure from Cohort 1 and 2. We regressed SBP on all these covariates mentioned above and obtained the residual of SBP referred to as SBP*. Our adjustment was initially done on each of all 100 replicates, respectively, consisting of around n = 99,300 observations from about n = 2747 sib pairs in each sample. Additionally, we adjusted on a larger sample by pooling two replicates randomly selected (replicate 43 and 47) that included the 199,536 observations from n = 5512 sib pairs.
Sib pair linkage analysis
In linkage analysis, we investigated the revised Haseman and Elston linkage statistic . For the second stage of model, the mean-corrected cross-product of SBP* was used as a dependent variable, defined by
C(SBP j *) = (SBPj 1* - m) (SBPj 2* - m), (1)
where SBPj 1* and SBPj 2* are the residual of the observed SBP s for the first and second sibs, respectively, in the jth pair, and m is the mean of SBP ji * for all i and j. We considered as independent variables the number of alleles IBD at the locus in the sib pair. As similarly described in Suh et al. , we denote I k for k = 1, 2, ..., 6 as the number of alleles IBD at six markers closest to b34, b35, b36, s10, s11, and s12, which determine SBP. We also denote U l for l = 1, 2, ..., 5 as the number of alleles IBD at five genes closest to b5, b14, b16, b18, and b21, which are unrelated to any of these loci.
The mixed model
We considered three different models to analyze longitudinal data. First, we fitted an independence model (Model 1) which is defined as
C(SBP j *) = α + Σβ k I jk + Σγ l U jl + ε j ,
where β k for k = 1, 2, ..., 6 and γ l for l = 1, 2, ..., 5 are parameters to be estimated.
Our second approach of the mixed model was a random effects model (Model 2). We considered the correlation between sib pairs in the model, assuming random effects to account for correlation between two sib pairs that share a common sibling.
C(SBP j *) = α + Σβ k I jk + Σγ l U jl + Σδ m R jm + ε j , (2)
where E(δ m ) = 0 and Var(δ m ) = σ2δmfor which the mth (m = 1, 2) sibling is in common. If the mth sibling is in common, then R jm = 1, otherwise R jm = 0 for each of m = 1, 2.
Third, we considered one more random effect when different sib pairs are obtained from the same parents (Model 3). We added to the model equation (2) m = 0 when sib pairs have the same parents.
Haseman JK, Elston RC: The investigation of linkage between a quantitative trait and a marker locus. Behav Genet. 1972, 2: 3-19. 10.1007/BF01066731.
Elston RC, Buxbaum S, Jacobs KB, Olson JM: Haseman and Elston revisited. Genet Epidemiol. 2000, 19: 1-17. 10.1002/1098-2272(200007)19:1<1::AID-GEPI1>3.0.CO;2-E.
Laird NM, Ware JH: Random-effects models for longitudinal data. Biometrics. 1982, 38: 963-974. 10.2307/2529876.
Jennrich RI, Schluchter MD: Unbalanced repeated-measures models with structured covariance matrices. Biometrics. 1986, 42: 805-820. 10.2307/2530695.
Liang KY, Zeger SL: Longitudinal data analysis using generalized linear models. Biometrika. 1986, 73: 13-22. 10.2307/2336267.
Park T: A comparison of the generalized estimating equation approach with the maximum likelihood approach for repeated measurements. Stat Med. 1993, 12: 1723-1732.
Suh YJ, Finch SJ, Mendell NR: Application of a Bayesian method for optimal subset regression to linkage analysis of Q1 and Q2. Genet Epidemiol. 2001, 21 (suppl 1): S706-S711.
This work was supported by the BK21 project from the Korea Research Foundation.