Nonparametric longitudinal allele-sharing model
BMC Geneticsvolume 4, Article number: S85 (2003)
Basically no methods are available for the analysis of quantitative traits in longitudinal genetic epidemiological studies. We introduce a nonparametric factorial design for longitudinal data on independent sib pairs, modelling the phenotypic quadratic differences as the dependent variable. Factors are the number of alleles shared identically by descent (IBD) and the age categories at which the dependent variable is measured, allowing for dependence due to age. To identify a linked marker a rank statistic tests the influence of IBD group on phenotypic quadratic differences. No assumptions are made on normality or variances of the dependent variable. We apply our method to 71 sib pairs from the Framingham Heart Study data provided at the Genetic Analysis Workshop 13. For all 15 available markers on chromosome 17 we analyzed the influence on systolic blood pressure. In addition, different selection strategies to sample from the whole data are discussed.
Long-term cohorts like the Framingham Heart Study (FHS) with regular follow-up examinations yield high-quality longitudinal data. Using phenotypic information at only one examination or an aggregate measure like the mean over time would lead to a substantial loss of information. However, for the analysis of quantitative genetic traits basically no methods for longitudinal data are available. For such data we propose a nonparametric factorial design, originally developed for clinical studies . We utilize principles of the Haseman-Elston method .
The Genetic Analysis Workshop 13 (GAW13) data are based on the Framingham Heart Study. FHS selection criteria and study design have been previously described. Starting in 1948, 5209 subjects between the ages 28 and 62 were enrolled in the original cohort study , and starting in 1971, 5124 cohort offspring with spouses were enrolled in the offspring study . Our interest focuses on the longitudinal measurements of systolic blood pressure (SBP) on sib pairs from the offspring study of the original FHS data. Follow-up examinations took place first after 8 years then at 4-year intervals. For some individuals, measurements were not taken at all times.
We considered all 15 markers from 0.63 cM to 138.03 cM on chromosome 17, where previous linkages to the region covering the angiotensin converting enzyme (ACE) gene located at 84.2 cM to 90.2 cM  and adjacent regions at 67 cM and 94 cM  have been reported. Nuclear families (siblings and parents) should be genotyped to determine the number of alleles shared identically by descent (IBD) as unambiguously as possible. Seventy-one pedigrees with nuclear families were available. From these we selected independent sib pairs with parents. In three pedigrees one of several nuclear families was chosen, and in 34 families, one sib pair within a larger sibship.
The nonparametric longitudinal allele-sharing model introduced here considers the relation between the quantitative trait and the number of marker alleles shared IBD in an independent sib-pair sample, considering the trait values at different ages of the sib pairs. In particular, n independent sib pairs are grouped in three IBD groups i (i = 0,1,2), each consisting of n i pairs. For each marker and each sib pair the IBD probability distribution was determined by multipoint analysis in the complete pedigree of the original FHS data. Some extended pedigrees had to be truncated without loss of information. IBD probabilities were calculated using MERLIN  and grouped into IBD groups. These are defined by IBD = 0 for [0,0.5), IBD = 1 for [0.5,1.5), and IBD = 2 for [1.5,2).
The phenotypic quadratic differences  for sib pair k of IBD group i (k = 1,...,n i ) are denoted by ϖ ikt = (Yikt,1 - Yikt,2)2, where Yikt,1 and Yikt,2 are SBP-measurements at times t (t = 1,...,6). There are several problems when defining time: 1) measurements are in general at 4-year interval, but the first interval is 8 years; 2) there are individuals with missing measurements; 3) the probands' ages at the first examination vary drastically, ranging from 13 to 48 years; 4) some individuals received treatment for high blood pressure. SBP under hypertensive treatment is generally lower than without treatment. Since treatment was rare in the sibships, we neglected it. In the 200 individuals of the 71 sibships, considering the measurements at the oldest ages, only 12 individuals received treatment and even fewer were treated at younger ages.
For comparability of SBP measurements within a time point, we considered age at examination rather than examination number as time and chose the age classes for sib pairs as follows: t = 1: [26,30), t = 2: [30,34), t = 3: [34,38), t = 4: [38,42), t = 5: [42,46), and t = 6: [46,50). For a sib pair k with IBD group i the phenotypic quadratic difference ϖ ikt is accepted for a particular age group t if the pairs' mean age at the time of measurement is in the corresponding age interval.
Since for some families several sib pairs are available, we consider three selection strategies to choose one pair per family, yielding an independent sib-pair sample. For strategy S LONGITUDINAL pairs are primarily chosen to minimize the amount of missing SBP measurements in the age groups and secondarily for small age differences within pairs. Random selection followed if necessary. This longitudinally driven strategy results in a sample independent of the considered marker. The other two genotype-driven strategies select sib pairs using IBD probabilities, thus yielding a different sample for each marker. S MAXPROB selects those sib pairs in a family who have the maximum probability for an IBD value of all pairs yielding surest classification in an IBD group. Should more than one pair be selected, the subsequent ordered selection criteria are the three criteria used for the first strategy. S EQUAL tries to optimize the factorial design by equalizing the number of pairs in the IBD groups. The expected IBD distribution is P(IBD = 1) = 0.5 and P(IBD = 0) = P(IBD = 2) = 0.25. Within a family, pairs with IBD Group 1 are deleted whenever pairs with another IBD group are available. The subsequent selection steps are as above.
Originally the design for the described model was an experimental design for clinical studies . It assumes independence of the phenotypic quadratic differences, ϖ ikt , for different sib pairs. The longitudinal observations for pair k in IBD group i, denoted by ϖ ik = (ϖik1,...,ϖik6)T, can be arbitrarily dependent.
Denote the distribution function of ϖ ikt by F it (i = 0,1,2, t = 1,...,6), where F it = P(X <x) + 0.5 P(X = x). No assumptions on F it are made except exclusion of one-point distributions. There are a total number of 6n possible observations where n = Σn i . The method allows for missing values .
In this model no distributional parameters, such as the mean, are specified. A nonparametric effect is defined by a contrast of the distribution functions
p it = ∫ GdF it ,
where G is the average of all marginal distributions over IBD groups i and age groups t. This relative effect quantifies the relation of the marginal distribution F it with respect to G. If F it tends to the left of G (at a specific position G has smaller values than F it ) then p it < 0.5, and likewise for p it > 0.5. p it = 0.5 indicates no such tendency. The relationship p it <pi't' indicates that F it tends to smaller values than Fi't' with respect to G. The consistent estimator of the relative effect is based on ranks.
Hypothesis for gene effect
Let Fi. = Σ F it . The null hypothesis is H0: F0. = F1. = F2. Under H0 there are no differences between IBD groups and thus no influence of the marker's IBD on the phenotypic quadratic SBP differences for sib pairs taking all longitudinal measurements into account. Rejection of H0 implies differences between IBD groups, and thus supports an influence of the marker on SBP assuming that phenotypic similarity increases with higher IBD group.
To test the null hypothesis given above of no differences between IBD groups an ANOVA-like test statistic based on standardized squared differences of rank-means can be employed. This test statistic Q is asymptotically F-distributed with appropriate degrees of freedom f1 and f2 (R ikt denotes the rank of ϖ ikt ):
, where with mean over all ranks R ikt at the six time points, the mean of the ranks and the mean over all ranks . The estimated degrees of freedom are and .
The samples resulting from the three selection strategies differ by the amount of missing observations and random selected sib pairs differ with respect to IBD information and the size of each IBD group. Table 1 displays these properties, averaging across marker for S MAXPROB and S EQUAL . For 71 sib pairs and six age groups there are 426 possible observations. The percentage missing values is approximately 40% and thus very high for all strategies. It is smallest for S LONGITUDINAL . This strategy also has the highest number of randomly selected pairs, using fewer selection criteria. The IBD probability, i.e., the certainty for a pair assigned to a particular IBD group for the corresponding IBD values 0, 1, or 2, is approximately 90% and thus sufficiently high for all strategies. S MAXPROB , using this, yields probabilities above 90% for all IBD values. For S LONGITUDINAL and S MAXPROB the numbers of pairs within IBD groups are approximately those expected. For S EQUAL more equal group sizes are reached with an overrepresentation of IBD group 2.
For each strategy and for all markers on chromosome 17 we tested for phenotypic differences between IBD groups. In order to reduce the number of missing values, pairs with less than three measurements in time were neglected. At 34.56 cM and 108.27 cM significance at 5% was reached for two strategies, with the highest peak at 34.56 cM (Figure 1). With multiple testing corrections no significant results are found. Although the p-value curves are similar across strategies, S LONGITUDINAL tends to smaller and S EQUAL to higher p-values.
Figure 2 shows box plots for the SBP quadratic differences of sib pairs for each age and IBD group at 34.56 cM. In four age groups median and maximum value are highest in IBD group 0. In five age groups the median of IBD group 2 is smallest. Thus more phenotypic similarity corresponds to higher IBD groups indicating linkage. The distributions within groups are highly skewed. Variances are not equal. A nonparametric approach not assuming normality and variance homoscedasticity is warranted.
On chromosome 17 markers with significant influence on SBP could not be identified. Previously linkage to the ACE gene  and to adjacent areas of chromosome 17  using the Framingham study have been reported. The linkage to ACE  was based on a subgroup analysis for men only. The analysis of O'Donnell et al.  differed in the final data set of the FHS used, in the definition of the dependent variable, as well as of course in the method applied. We used the complete pedigree information for the calculation of multipoint IBD probabilities. Then we demonstrated our newly introduced method on a largely reduced subset of the data using well characterized independent sib pairs only. Currently this is a major limitation of this method if the data are not ascertained observing this design. We did not yet investigate whether and how the assumption of independence of the sib pairs can be relaxed in order to include larger sibships or different types of relative pairs into the analysis. In our study the p-values (see Figure 1) at 34.56 cM and 108.27 cM possibly indicate linkage. At 34.56 cM sib pairs tend to be more similar in SBP with increasing number of alleles shared IBD (see Figure 2), supporting possible linkage. This is not the case at 108.27 cM, where linkage is not indicated although p-values are small. Several other GAW13 contributions focussing on SBP as well (GAW13 group 9) reported linkage on chromosome 17, but not to the ACE gene region (for a summary, see ).
The sample selection strategies, driven by phenotype or genotype, lead to approximately 40% missing observations. This is very high and could effect the results in general. In contrast to other GAW13 groups we do not impute missing values. The described approach can handle missing data.
Also, IBD probabilities do not differ much between strategies. Thus this should not be the main cause for differences in p-value. The size of IBD groups varies drastically. S LONGITUDINAL and S MAXPROB place approximately half of the pairs in IBD group 1; S EQUAL emphasizes IBD groups 0 and 2. This results in less significant p-values for S EQUAL than for the other strategies.
As seen in Figure 2, the quadratic SBP differences are not normally distributed and variances between IBD groups are not equal. Therefore a nonparametric approach was necessary. Our approach can also be applied to a diallelic marker with three genotypes in an association study instead of IBD groups in a linkage study. In this context the dependent variable can also be ordinal, such as a score. We can also test for an interaction effect between age group and IBD group, appropriate for a marker influence on SBP with age of onset. In the future, we will further investigate these approaches and their properties.
Our main aim was to introduce a new approach for longitudinal data, which explains the underlying example, the data of the Framingham study. We required independence of sib pairs, full genotyping in nuclear families including parents, and longitudinality SBP observations on each sib pair for at least two age categories. Thus, we could only use a small subset of the data, resulting in a loss of power. If planning a new study the design can explicitly be incorporated to make optimal use of the data. Also the effects of relaxing the requirements above can be examined, such as full genotyping also in parents for multipoint IBD determination.
Brunner E, Munzel U, Puri ML: Rank-score tests in factorial designs with repeated measures. J Multivariate Anal. 1999, 70: 286-317. 10.1006/jmva.1999.1821.
Haseman JK, Elston RC: The investigation of linkage between a quantitative trait and a marker locus. Behav Genet. 1972, 2: 3-19. 10.1007/BF01066731.
Dawber DR, Meadors GF, Moore FEJ: Epidemiological approaches to heart disease: the Framingham Study. Am J Public Health. 1951, 41: 279-293.
Kannel WB, Feinleib M, McNamara PM, Garrison RJ, Castelli WP: An investigation of coronary heart disease in families: the Framingham Offspring Study. Am J Epidem. 1979, 110: 281-290.
O'Donnell CJ, Lindpainter K, Larson MG, Rao SV, Ordovas JM, Schaefer EJ, Myers RH, Levy D: Evidence for association and genetic linkage of the angiotensin-converting enzyme locus with hypertension and blood pressure in men but not in women in the Framingham Heart Study. Circulation. 1998, 97: 1766-1772.
Levy D, DeStefano AL, Larson MG, O'Donnell CJ, Lifton RP, Haralambos G, Cupples LA, Myers RH: Evidence for a gene influencing blood pressure on chromosome 17. Hypertension. 2000, 36: 477-483.
Abecasis GR, Cherny SS, Cookson WOC, Cardon LR: MERLIN – Rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet. 2002, 30: 97-101. 10.1038/ng786.
Domhof S, Brunner E, Osgood W: Rank procedures for repeated measures with missing values. Soc Meth Res. 2002, 30: 367-393. 10.1177/0049124102030003004.
Bickeböller H, Barrett JH, Jacobs K, Rosenberger A: Modeling and dissection of longitudinal blood pressure and hypertension phenotypes in genetic epidemiological studies. Genet Epidemiol.
About this article
- Systolic Blood Pressure
- Framingham Heart Study
- Genetic Analysis Workshop
- Angiotensin Converting Enzyme Gene
- Systolic Blood Pressure Measurement