Volume 4 Supplement 1

## Genetic Analysis Workshop 13: Analysis of Longitudinal Family Data for Complex Diseases and Related Risk Factors

# Nonparametric longitudinal allele-sharing model

- Bettina Kulle
^{1}Email author, - Karola Köhler
^{1}, - Albert Rosenberger
^{1}, - Sabine Loesgen
^{1}and - Heike Bickeböller
^{1}

**4(Suppl 1)**:S85

**DOI: **10.1186/1471-2156-4-S1-S85

© Kulle et al; licensee BioMed Central Ltd 2003

**Published: **31 December 2003

## Abstract

Basically no methods are available for the analysis of quantitative traits in longitudinal genetic epidemiological studies. We introduce a nonparametric factorial design for longitudinal data on independent sib pairs, modelling the phenotypic quadratic differences as the dependent variable. Factors are the number of alleles shared identically by descent (IBD) and the age categories at which the dependent variable is measured, allowing for dependence due to age. To identify a linked marker a rank statistic tests the influence of IBD group on phenotypic quadratic differences. No assumptions are made on normality or variances of the dependent variable. We apply our method to 71 sib pairs from the Framingham Heart Study data provided at the Genetic Analysis Workshop 13. For all 15 available markers on chromosome 17 we analyzed the influence on systolic blood pressure. In addition, different selection strategies to sample from the whole data are discussed.

## Background

Long-term cohorts like the Framingham Heart Study (FHS) with regular follow-up examinations yield high-quality longitudinal data. Using phenotypic information at only one examination or an aggregate measure like the mean over time would lead to a substantial loss of information. However, for the analysis of quantitative genetic traits basically no methods for longitudinal data are available. For such data we propose a nonparametric factorial design, originally developed for clinical studies [1]. We utilize principles of the Haseman-Elston method [2].

The Genetic Analysis Workshop 13 (GAW13) data are based on the Framingham Heart Study. FHS selection criteria and study design have been previously described. Starting in 1948, 5209 subjects between the ages 28 and 62 were enrolled in the original cohort study [3], and starting in 1971, 5124 cohort offspring with spouses were enrolled in the offspring study [4]. Our interest focuses on the longitudinal measurements of systolic blood pressure (SBP) on sib pairs from the offspring study of the original FHS data. Follow-up examinations took place first after 8 years then at 4-year intervals. For some individuals, measurements were not taken at all times.

We considered all 15 markers from 0.63 cM to 138.03 cM on chromosome 17, where previous linkages to the region covering the angiotensin converting enzyme (ACE) gene located at 84.2 cM to 90.2 cM [5] and adjacent regions at 67 cM and 94 cM [6] have been reported. Nuclear families (siblings and parents) should be genotyped to determine the number of alleles shared identically by descent (IBD) as unambiguously as possible. Seventy-one pedigrees with nuclear families were available. From these we selected independent sib pairs with parents. In three pedigrees one of several nuclear families was chosen, and in 34 families, one sib pair within a larger sibship.

## Methods

The nonparametric longitudinal allele-sharing model introduced here considers the relation between the quantitative trait and the number of marker alleles shared IBD in an independent sib-pair sample, considering the trait values at different ages of the sib pairs. In particular, *n* independent sib pairs are grouped in three *IBD groups i* (*i* = 0,1,2), each consisting of *n*_{
i
}pairs. For each marker and each sib pair the IBD probability distribution was determined by multipoint analysis in the complete pedigree of the original FHS data. Some extended pedigrees had to be truncated without loss of information. IBD probabilities were calculated using MERLIN [7] and grouped into IBD groups. These are defined by IBD = 0 for [0,0.5), IBD = 1 for [0.5,1.5), and IBD = 2 for [1.5,2).

The *phenotypic quadratic differences* [8] for sib pair *k* of IBD group *i* (*k* = 1,...,*n*_{
i
}) are denoted by ϖ_{
ikt
}= (*Y*_{ikt,1 }- *Y*_{ikt,2})^{2}, where *Y*_{ikt,1 }and *Y*_{ikt,2 }are SBP-measurements at times *t* (*t* = 1,...,6). There are several problems when defining time: 1) measurements are in general at 4-year interval, but the first interval is 8 years; 2) there are individuals with missing measurements; 3) the probands' ages at the first examination vary drastically, ranging from 13 to 48 years; 4) some individuals received treatment for high blood pressure. SBP under hypertensive treatment is generally lower than without treatment. Since treatment was rare in the sibships, we neglected it. In the 200 individuals of the 71 sibships, considering the measurements at the oldest ages, only 12 individuals received treatment and even fewer were treated at younger ages.

For comparability of SBP measurements within a time point, we considered age at examination rather than examination number as time and chose the age classes for sib pairs as follows: *t* = 1: [26,30), *t* = 2: [30,34), *t* = 3: [34,38), *t* = 4: [38,42), *t* = 5: [42,46), and *t* = 6: [46,50). For a sib pair *k* with IBD group *i* the phenotypic quadratic difference ϖ_{
ikt
}is accepted for a particular age group *t* if the pairs' mean age at the time of measurement is in the corresponding age interval.

Since for some families several sib pairs are available, we consider three selection strategies to choose one pair per family, yielding an independent sib-pair sample. For strategy *S*_{
LONGITUDINAL
}pairs are primarily chosen to minimize the amount of missing SBP measurements in the age groups and secondarily for small age differences within pairs. Random selection followed if necessary. This longitudinally driven strategy results in a sample independent of the considered marker. The other two genotype-driven strategies select sib pairs using IBD probabilities, thus yielding a different sample for each marker. *S*_{
MAXPROB
}selects those sib pairs in a family who have the maximum probability for an IBD value of all pairs yielding surest classification in an IBD group. Should more than one pair be selected, the subsequent ordered selection criteria are the three criteria used for the first strategy. *S*_{
EQUAL
}tries to optimize the factorial design by equalizing the number of pairs in the IBD groups. The expected IBD distribution is P(IBD = 1) = 0.5 and P(IBD = 0) = P(IBD = 2) = 0.25. Within a family, pairs with IBD Group 1 are deleted whenever pairs with another IBD group are available. The subsequent selection steps are as above.

### Design

Originally the design for the described model was an experimental design for clinical studies [1]. It assumes independence of the phenotypic quadratic differences, ϖ_{
ikt
}, for different sib pairs. The longitudinal observations for pair *k* in IBD group *i*, denoted by **ϖ**_{
ik
} = (ϖ_{ik1},...,ϖ_{ik6})^{T}, can be arbitrarily dependent.

Denote the distribution function of ϖ_{
ikt
}by *F*_{
it
}(*i* = 0,1,2, *t* = 1,...,6), where *F*_{
it
}= *P*(*X* <*x*) + 0.5 *P*(*X* = *x*). No assumptions on *F*_{
it
}are made except exclusion of one-point distributions. There are a total number of *6n* possible observations where *n* = Σ*n*_{
i
}. The method allows for missing values [3].

### Relative effect

In this model no distributional parameters, such as the mean, are specified. A nonparametric effect is defined by a contrast of the distribution functions

*p*_{
it
}= ∫ *GdF*_{
it
},

where *G* is the average of all marginal distributions over IBD groups *i* and age groups *t*. This relative effect quantifies the relation of the marginal distribution *F*_{
it
}with respect to *G*. If *F*_{
it
}tends to the left of *G* (at a specific position *G* has smaller values than *F*_{
it
}) then *p*_{
it
}< 0.5, and likewise for *p*_{
it
}> 0.5. *p*_{
it
}= 0.5 indicates no such tendency. The relationship *p*_{
it
}<*p*_{i't' }indicates that *F*_{
it
}tends to smaller values than *F*_{i't' }with respect to *G*. The consistent estimator of the relative effect is based on ranks.

### Hypothesis for gene effect

Let *F*_{i. }= Σ *F*_{
it
}. The null hypothesis is H_{0}: *F*_{0.} = *F*_{1.} = *F*_{2.} Under H_{0} there are no differences between IBD groups and thus no influence of the marker's IBD on the phenotypic quadratic SBP differences for sib pairs taking all longitudinal measurements into account. Rejection of H_{0} implies differences between IBD groups, and thus supports an influence of the marker on SBP assuming that phenotypic similarity increases with higher IBD group.

### Test statistic

To test the null hypothesis given above of no differences between IBD groups an ANOVA-like test statistic based on standardized squared differences of rank-means can be employed. This test statistic *Q* is asymptotically F-distributed with appropriate degrees of freedom *f*_{1} and *f*_{2} (*R*_{
ikt
}denotes the rank of ϖ_{
ikt
}):

## Results

*S*

_{ MAXPROB }and

*S*

_{ EQUAL }. For 71 sib pairs and six age groups there are 426 possible observations. The percentage missing values is approximately 40% and thus very high for all strategies. It is smallest for

*S*

_{ LONGITUDINAL }. This strategy also has the highest number of randomly selected pairs, using fewer selection criteria. The IBD probability, i.e., the certainty for a pair assigned to a particular IBD group for the corresponding IBD values 0, 1, or 2, is approximately 90% and thus sufficiently high for all strategies.

*S*

_{ MAXPROB }, using this, yields probabilities above 90% for all IBD values. For

*S*

_{ LONGITUDINAL }and

*S*

_{ MAXPROB }the numbers of pairs within IBD groups are approximately those expected. For

*S*

_{ EQUAL }more equal group sizes are reached with an overrepresentation of IBD group 2.

Properties of the analysis samples yielded by the selection strategies

Strategy | Missing Observations (%) | Randomly selected sib pairs | IBD probability (%) | Size IBD groups (%) |
---|---|---|---|---|

LONGITUDINAL | 38.0 | 15 | (89.7, 89.2, 90.6) | (22.6, 53.2, 24.2) |

MAXPROB | 40.4 | 8.9 | (90.9, 91.9, 93.1) | (21.7, 51.5, 26.8) |

EQUAL | 41.1 | 4.8 | (90.7, 86.9, 91.4) | (33.3, 27.6, 39.1) |

*p*-value curves are similar across strategies,

*S*

_{ LONGITUDINAL }tends to smaller and

*S*

_{ EQUAL }to higher

*p*-values.

Figure 2 shows box plots for the SBP quadratic differences of sib pairs for each age and IBD group at 34.56 cM. In four age groups median and maximum value are highest in IBD group 0. In five age groups the median of IBD group 2 is smallest. Thus more phenotypic similarity corresponds to higher IBD groups indicating linkage. The distributions within groups are highly skewed. Variances are not equal. A nonparametric approach not assuming normality and variance homoscedasticity is warranted.

## Discussion

On chromosome 17 markers with significant influence on SBP could not be identified. Previously linkage to the ACE gene [8] and to adjacent areas of chromosome 17 [6] using the Framingham study have been reported. The linkage to ACE [6] was based on a subgroup analysis for men only. The analysis of O'Donnell et al. [5] differed in the final data set of the FHS used, in the definition of the dependent variable, as well as of course in the method applied. We used the complete pedigree information for the calculation of multipoint IBD probabilities. Then we demonstrated our newly introduced method on a largely reduced subset of the data using well characterized independent sib pairs only. Currently this is a major limitation of this method if the data are not ascertained observing this design. We did not yet investigate whether and how the assumption of independence of the sib pairs can be relaxed in order to include larger sibships or different types of relative pairs into the analysis. In our study the *p*-values (see Figure 1) at 34.56 cM and 108.27 cM possibly indicate linkage. At 34.56 cM sib pairs tend to be more similar in SBP with increasing number of alleles shared IBD (see Figure 2), supporting possible linkage. This is not the case at 108.27 cM, where linkage is not indicated although *p*-values are small. Several other GAW13 contributions focussing on SBP as well (GAW13 group 9) reported linkage on chromosome 17, but not to the ACE gene region (for a summary, see [9]).

The sample selection strategies, driven by phenotype or genotype, lead to approximately 40% missing observations. This is very high and could effect the results in general. In contrast to other GAW13 groups we do not impute missing values. The described approach can handle missing data.

Also, IBD probabilities do not differ much between strategies. Thus this should not be the main cause for differences in *p*-value. The size of IBD groups varies drastically. *S*_{
LONGITUDINAL
}and *S*_{
MAXPROB
}place approximately half of the pairs in IBD group 1; *S*_{
EQUAL
}emphasizes IBD groups 0 and 2. This results in less significant *p*-values for *S*_{
EQUAL
}than for the other strategies.

As seen in Figure 2, the quadratic SBP differences are not normally distributed and variances between IBD groups are not equal. Therefore a nonparametric approach was necessary. Our approach can also be applied to a diallelic marker with three genotypes in an association study instead of IBD groups in a linkage study. In this context the dependent variable can also be ordinal, such as a score. We can also test for an interaction effect between age group and IBD group, appropriate for a marker influence on SBP with age of onset. In the future, we will further investigate these approaches and their properties.

## Conclusion

Our main aim was to introduce a new approach for longitudinal data, which explains the underlying example, the data of the Framingham study. We required independence of sib pairs, full genotyping in nuclear families including parents, and longitudinality SBP observations on each sib pair for at least two age categories. Thus, we could only use a small subset of the data, resulting in a loss of power. If planning a new study the design can explicitly be incorporated to make optimal use of the data. Also the effects of relaxing the requirements above can be examined, such as full genotyping also in parents for multipoint IBD determination.

## Authors’ Affiliations

## References

- Brunner E, Munzel U, Puri ML: Rank-score tests in factorial designs with repeated measures. J Multivariate Anal. 1999, 70: 286-317. 10.1006/jmva.1999.1821.View ArticleGoogle Scholar
- Haseman JK, Elston RC: The investigation of linkage between a quantitative trait and a marker locus. Behav Genet. 1972, 2: 3-19. 10.1007/BF01066731.View ArticlePubMedGoogle Scholar
- Dawber DR, Meadors GF, Moore FEJ: Epidemiological approaches to heart disease: the Framingham Study. Am J Public Health. 1951, 41: 279-293.View ArticleGoogle Scholar
- Kannel WB, Feinleib M, McNamara PM, Garrison RJ, Castelli WP: An investigation of coronary heart disease in families: the Framingham Offspring Study. Am J Epidem. 1979, 110: 281-290.Google Scholar
- O'Donnell CJ, Lindpainter K, Larson MG, Rao SV, Ordovas JM, Schaefer EJ, Myers RH, Levy D: Evidence for association and genetic linkage of the angiotensin-converting enzyme locus with hypertension and blood pressure in men but not in women in the Framingham Heart Study. Circulation. 1998, 97: 1766-1772.View ArticlePubMedGoogle Scholar
- Levy D, DeStefano AL, Larson MG, O'Donnell CJ, Lifton RP, Haralambos G, Cupples LA, Myers RH: Evidence for a gene influencing blood pressure on chromosome 17. Hypertension. 2000, 36: 477-483.View ArticlePubMedGoogle Scholar
- Abecasis GR, Cherny SS, Cookson WOC, Cardon LR: MERLIN – Rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet. 2002, 30: 97-101. 10.1038/ng786.View ArticlePubMedGoogle Scholar
- Domhof S, Brunner E, Osgood W: Rank procedures for repeated measures with missing values. Soc Meth Res. 2002, 30: 367-393. 10.1177/0049124102030003004.View ArticleGoogle Scholar
- Bickeböller H, Barrett JH, Jacobs K, Rosenberger A: Modeling and dissection of longitudinal blood pressure and hypertension phenotypes in genetic epidemiological studies. Genet Epidemiol.Google Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.