- Proceedings
- Open Access
- Published:

# Linkage analysis of longitudinal data

*BMC Genetics*
**volume 4**, Article number: S27 (2003)

## Abstract

### Background

We propose a statistical model for linkage analysis of the longitudinal data. The proposed model is a mixed model based on the new Haseman and Elston model and allows several random effects. Specifically, the proposed model includes a random effect for correlation among sib pairs having one sibling in common, and one for the correlation among siblings from the same parents.

### Results

The proposed model was applied to the analysis of the Genetic Analysis Workshop 13 simulated data set for a quantitative trait of the systolic blood pressure. A simple independence model and two kinds of random effects models yielded good power for detecting linkage for these data sets, while the random effects models performed slightly better than the independence model. Both random effects models showed similar performance.

### Conclusions

The proposed models seem not only quite useful in detecting linkage with the longitudinal data for the trait but also quite flexible. They can handle a wide class of correlation structures. Models with a more general class of covariance structure are desirable.

## Background

We explore the Genetic Analysis Workshop (GAW13) simulated data set, which contains longitudinal data for two cohorts drawn from 330 pedigrees containing 4692 individuals, with data collection on each cohort starting about 30 years apart. The first cohort was examined 21 times at two-year intervals. The second cohort was examined five times at four-year intervals with eight years between the first two examinations. With knowledge of the answers, we test linkage to identify those markers linked to genes for the quantitative trait of the blood pressure (*BP*). We found that the trait systolic blood pressure (*SBP*) is affected by several quantitative trait loci and nongenetic factors such as gender, age, total cholesterol, smoking, fasting glucose, hypertension treatment, and weight.

For detecting linkage, Haseman and Elston [1] proposed the nonparametric linkage method for a quantitative trait. This procedure involves simple regression of the squared difference of sib pair trait identity on the proportion of alleles shared IBD (identical by descent) at genetic markers. In a method developed later by Elston et al. [2], the mean-corrected cross-product of the trait replaces the measure's squared difference. This implementation is proposed as a method to get rid of possible correlation between observations when a family in the sample consists of more than two offspring. For better understanding and better power, we require a statistical analysis that allows us to examine multiple genes at the same time. In this regard, the method extends to multiple regressions for detecting linkage at several loci that determine the traits.

Longitudinal data arise when an outcome variable of interest is measured repeatedly over time from the same subject. Repeated observations from the same individual are usually correlated. To account for correlation in the analysis, mixed models are commonly used to analyze longitudinal data. Linear mixed models with random subject effects were proposed by Laird and Ware [3]. Jennrich and Schluchter proposed a more general class of models with structured covariances [4]. Liang and Zeger proposed a model based on the generalized estimating equation (GEE) that can handle both normally and non-normally distributed outcomes [5]. Though the GEE approach can be used for normally distributed outcomes, it is shown to be less efficient than the maximum likelihood approach [6]. Mixed models usually assume a special form of covariance structure and use maximum likelihood or restricted maximum likelihood estimation to obtain the estimators of model parameters. Iterative algorithms for parameter estimation are generally required.

In this study, we propose a mixed model for linkage analysis of the longitudinal data. Our model basically has the same form of the new Haseman and Elston model [2]. To incorporate the interrelation among correlated observations, it uses the same correlation structures of ordinary mixed models. In the model, we specifically consider a random effect for correlation among sib pairs having one sib in common, and one for the correlation among siblings from the same parents. We believe that the proposed model is easy to apply and can handle a wide class of correlation structures. To identify linkage by using the proposed model, we consider the genes closest to b34, b35, b36, s10, s11, and s12 as candidate marker loci, since we know that *SBP* is affected by genes of b34, b35, b36, s10, s11, and s12. Also we select five markers of b5, b14, b16, b18, and b21, which are taken from different chromosomes.

## Results

We performed linkage analysis on the quantitative trait *SBP** (*SBP* adjusted for gender, age, total cholesterol, smoking, fasting glucose, hypertension treatment, weight, and high blood pressure) from Cohorts 1 and 2. *SBP** was determined in part by b34, b35, b36, s10, s11, and s12. We found the results for the mean-corrected cross-product of *SBP**, henceforth refer to as *C*(*SBP**) (see equation (2) in Methods) by using three different mixed models. We tested H_{0}: β_{
k
}(or γ_{
l
}) ≤ 0 vs. H_{
A
}: β_{
k
}(or γ_{
l
}) > 0 for the linkage data set. If *T* ≥ 2.14 (i.e., lod score ≥ 1.0), the β_{
k
}(or γ_{
l
}) was considered as in the model, where *k* = 1, ..., 6 and *l* = 1, ..., 5.

First, we selected at random one replicate (replicate 43, consisting of the 99,714 observations from *n* = 2772 sib pairs) out of 100 replicates and examined linkage. To obtain better outcomes, we also analyzed a larger sample created by combining two replicates (replicate 43 and 47, randomly selected) including the 199,536 observations from *n* = 5512 sib pairs. In Table 1, we report the results of independence model (Model 1) and random effects models (Model 2 and 3). We found that three different approaches on a single sample were basically similar to detecting linkage. Most of the variables *I*_{
k
}(*k* = 1, ..., 6), which denotes the number of alleles IBD at marker locus closest to genes determining *SBP*, were significantly detected by an independence model (Model 1) using two replicates combined. For *U*_{
l
}(*l* = 1, ..., 5) which is the number of alleles IBD at genes closest to five unlinked markers, all variables were not significant using random effects models (Model 2 and 3) with two replicates combined.

We then performed linkage in each of all 100 replicates, respectively. Each sample was derived from around *n* = 99,300 observations from about *n* = 2747 sib pairs. As shown in Table 2, we analyzed power for *C*(*SBP**) in each of three different models. As can be seen in the table, the power was generally high for most of the variables *I*_{
k
}(*k* = 1, ..., 6) and tended to increase as random effects were added in the model. Under Model 3, the corresponding power was the highest in 50% of the variables *I*_{
k
}(*k* = 1, ..., 6) among three models.

For the GAW13 simulated data on *SBP**, we conclude that the random effects models (Model 2 and 3) seems to work slightly better than the independence model (Model 1) to identify linkage while considering all candidate markers at the same time. Both random effects models showed similar performance in detecting linkage for these data.

## Discussion

The models for longitudinal data mainly focus on how to handle the correlations among the repeated measurements. Appropriate random effects can summarize correlations effectively. The time effects can be easily treated as one covariate of interest in the model. The main focus of the proposed model is allowing for appropriate random effects for the correlated sib pairs in the Haseman-Elston model [2]. The correlation may be caused by a common sibling or by a common parent. Also, it can be caused by the repeated observation for the same sib pair at different observation times. The proposed model can include corresponding random effects easily. It can handle a wide class of correlation structures.

If we were interested in the inference for the time effect, then the first-stage model need not include the time effect but the second-stage model should. Since we worked with a simulated data set, we mainly focused on comparing the independence model with random-effects models.

In our analysis, we used SAS to analyze the mixed model for longitudinal data. For a sib pair linkage analysis, a C program was implemented. We have not applied any standard quantitative trait loci (QTL) software yet because we are not sure whether it can handle the proposed model. Certainly, it might be interesting to investigate further.

We are planning to do linkage analysis by combining more replicates. We expect that the proposed models perform much better in detecting linkage for larger samples with more replicates.

## Methods

### Preliminary study

At the first stage of model fitting, we adjusted *SBP* by known effective nongenetic factors of gender, age, total cholesterol, smoking, fasting glucose, hypertension treatment, and weight, and high blood pressure from Cohort 1 and 2. We regressed *SBP* on all these covariates mentioned above and obtained the residual of *SBP* referred to as *SBP**. Our adjustment was initially done on each of all 100 replicates, respectively, consisting of around *n* = 99,300 observations from about *n* = 2747 sib pairs in each sample. Additionally, we adjusted on a larger sample by pooling two replicates randomly selected (replicate 43 and 47) that included the 199,536 observations from *n* = 5512 sib pairs.

### Sib pair linkage analysis

In linkage analysis, we investigated the revised Haseman and Elston linkage statistic [2]. For the second stage of model, the mean-corrected cross-product of *SBP** was used as a dependent variable, defined by

*C*(*SBP*_{
j
}*) = (*SBP*_{j 1}* - m) (SBP_{j 2}* - *m*), (1)

where *SBP*_{j 1}* and *SBP*_{j 2}* are the residual of the observed *SBP* s for the first and second sibs, respectively, in the *j*^{th} pair, and *m* is the mean of *SBP*_{
ji
}* for all *i* and *j*. We considered as independent variables the number of alleles IBD at the locus in the sib pair. As similarly described in Suh et al. [7], we denote *I*_{
k
}for *k* = 1, 2, ..., 6 as the number of alleles IBD at six markers closest to b34, b35, b36, s10, s11, and s12, which determine *SBP*. We also denote *U*_{
l
}for *l* = 1, 2, ..., 5 as the number of alleles IBD at five genes closest to b5, b14, b16, b18, and b21, which are unrelated to any of these loci.

### The mixed model

We considered three different models to analyze longitudinal data. First, we fitted an independence model (Model 1) which is defined as

*C*(*SBP*_{
j
}*) = α + Σβ_{
k
}*I*_{
jk
}+ Σγ_{
l
}*U*_{
jl
}+ ε_{
j
},

where β_{
k
}for *k* = 1, 2, ..., 6 and γ_{
l
}for *l* = 1, 2, ..., 5 are parameters to be estimated.

Our second approach of the mixed model was a random effects model (Model 2). We considered the correlation between sib pairs in the model, assuming random effects to account for correlation between two sib pairs that share a common sibling.

*C*(*SBP*_{
j
}*) = α + Σβ_{
k
}*I*_{
jk
}+ Σγ_{
l
}*U*_{
jl
}+ Σδ_{
m
}*R*_{
jm
}+ ε_{
j
}, (2)

where *E*(δ_{
m
}) = 0 and *Var*(δ_{
m
}) = σ^{2}_{δm}for which the *m*^{th} (*m* = 1, 2) sibling is in common. If the *m*^{th} sibling is in common, then *R*_{
jm
}= 1, otherwise *R*_{
jm
}= 0 for each of *m* = 1, 2.

Third, we considered one more random effect when different sib pairs are obtained from the same parents (Model 3). We added to the model equation (2) *m* = 0 when sib pairs have the same parents.

## References

- 1.
Haseman JK, Elston RC: The investigation of linkage between a quantitative trait and a marker locus. Behav Genet. 1972, 2: 3-19. 10.1007/BF01066731.

- 2.
Elston RC, Buxbaum S, Jacobs KB, Olson JM: Haseman and Elston revisited. Genet Epidemiol. 2000, 19: 1-17. 10.1002/1098-2272(200007)19:1<1::AID-GEPI1>3.0.CO;2-E.

- 3.
Laird NM, Ware JH: Random-effects models for longitudinal data. Biometrics. 1982, 38: 963-974. 10.2307/2529876.

- 4.
Jennrich RI, Schluchter MD: Unbalanced repeated-measures models with structured covariance matrices. Biometrics. 1986, 42: 805-820. 10.2307/2530695.

- 5.
Liang KY, Zeger SL: Longitudinal data analysis using generalized linear models. Biometrika. 1986, 73: 13-22. 10.2307/2336267.

- 6.
Park T: A comparison of the generalized estimating equation approach with the maximum likelihood approach for repeated measurements. Stat Med. 1993, 12: 1723-1732.

- 7.
Suh YJ, Finch SJ, Mendell NR: Application of a Bayesian method for optimal subset regression to linkage analysis of Q1 and Q2. Genet Epidemiol. 2001, 21 (suppl 1): S706-S711.

## Acknowledgments

This work was supported by the BK21 project from the Korea Research Foundation.

## Author information

## Rights and permissions

## About this article

### Cite this article

Suh, Y.J., Park, T. & Cheong, S.Y. Linkage analysis of longitudinal data.
*BMC Genet* **4, **S27 (2003) doi:10.1186/1471-2156-4-S1-S27

#### Published

#### DOI

### Keywords

- Systolic Blood Pressure
- Linkage Analysis
- Longitudinal Data
- Random Effect Model
- Generalize Estimate Equation