Nonparametric longitudinal allele-sharing model

Basically no methods are available for the analysis of quantitative traits in longitudinal genetic epidemiological studies. We introduce a nonparametric factorial design for longitudinal data on independent sib pairs, modelling the phenotypic quadratic differences as the dependent variable. Factors are the number of alleles shared identically by descent (IBD) and the age categories at which the dependent variable is measured, allowing for dependence due to age. To identify a linked marker a rank statistic tests the influence of IBD group on phenotypic quadratic differences. No assumptions are made on normality or variances of the dependent variable. We apply our method to 71 sib pairs from the Framingham Heart Study data provided at the Genetic Analysis Workshop 13. For all 15 available markers on chromosome 17 we analyzed the influence on systolic blood pressure. In addition, different selection strategies to sample from the whole data are discussed.


Background
Long-term cohorts like the Framingham Heart Study (FHS) with regular follow-up examinations yield highquality longitudinal data. Using phenotypic information at only one examination or an aggregate measure like the mean over time would lead to a substantial loss of information. However, for the analysis of quantitative genetic traits basically no methods for longitudinal data are available. For such data we propose a nonparametric factorial design, originally developed for clinical studies [1]. We utilize principles of the Haseman-Elston method [2].
The Genetic Analysis Workshop 13 (GAW13) data are based on the Framingham Heart Study. FHS selection criteria and study design have been previously described. Starting in 1948, 5209 subjects between the ages 28 and 62 were enrolled in the original cohort study [3], and starting in 1971, 5124 cohort offspring with spouses were enrolled in the offspring study [4]. Our interest focuses on the longitudinal measurements of systolic blood pressure (SBP) on sib pairs from the offspring study of the original FHS data. Follow-up examinations took place first after 8 years then at 4-year intervals. For some individuals, measurements were not taken at all times.
We considered all 15 markers from 0.63 cM to 138.03 cM on chromosome 17, where previous linkages to the region covering the angiotensin converting enzyme (ACE) gene located at 84.2 cM to 90.2 cM [5] and adjacent regions at 67 cM and 94 cM [6] have been reported. Nuclear families (siblings and parents) should be genotyped to determine the number of alleles shared identically by descent (IBD) as unambiguously as possible. Seventy-one pedigrees with nuclear families were available. From these we selected independent sib pairs with parents. In three pedigrees one of several nuclear families was chosen, and in 34 families, one sib pair within a larger sibship.

Methods
The nonparametric longitudinal allele-sharing model introduced here considers the relation between the quantitative trait and the number of marker alleles shared IBD in an independent sib-pair sample, considering the trait values at different ages of the sib pairs. In particular, n independent sib pairs are grouped in three IBD groups i (i = 0,1,2), each consisting of n i pairs. For each marker and each sib pair the IBD probability distribution was determined by multipoint analysis in the complete pedigree of the original FHS data. Some extended pedigrees had to be truncated without loss of information. IBD probabilities were calculated using MERLIN [7] and grouped into IBD groups. These are defined by IBD = 0 for [0,0.5), IBD = 1 for [0.5,1.5), and IBD = 2 for [1.5,2).
The phenotypic quadratic differences [8] for sib pair k of IBD group i (k = 1,...,n i ) are denoted by 1 and Y ikt,2 are SBP-measurements at times t (t = 1,...,6). There are several problems when defining time: 1) measurements are in general at 4-year interval, but the first interval is 8 years; 2) there are individuals with missing measurements; 3) the probands' ages at the first examination vary drastically, ranging from 13 to 48 years; 4) some individuals received treatment for high blood pressure. SBP under hypertensive treatment is generally lower than without treatment. Since treatment was rare in the sibships, we neglected it. In the 200 individuals of the 71 sibships, considering the measurements at the oldest ages, only 12 individuals received treatment and even fewer were treated at younger ages. For a sib pair k with IBD group i the phenotypic quadratic difference ϖ ikt is accepted for a particular age group t if the pairs' mean age at the time of measurement is in the corresponding age interval.
Since for some families several sib pairs are available, we consider three selection strategies to choose one pair per family, yielding an independent sib-pair sample. For strategy S LONGITUDINAL pairs are primarily chosen to minimize the amount of missing SBP measurements in the age groups and secondarily for small age differences within pairs. Random selection followed if necessary. This longitudinally driven strategy results in a sample independent of the considered marker. The other two genotype-driven strategies select sib pairs using IBD probabilities, thus yielding a different sample for each marker. S MAXPROB selects those sib pairs in a family who have the maximum probability for an IBD value of all pairs yielding surest classification in an IBD group. Should more than one pair be selected, the subsequent ordered selection criteria are the three criteria used for the first strategy. S EQUAL tries to optimize the factorial design by equalizing the number of pairs in the IBD groups. The expected IBD distribution is P(IBD = 1) = 0.5 and P(IBD = 0) = P(IBD = 2) = 0.25. Within a family, pairs with IBD Group 1 are deleted whenever pairs with another IBD group are available. The subsequent selection steps are as above.

Design
Originally the design for the described model was an experimental design for clinical studies [1]. It assumes independence of the phenotypic quadratic differences, ϖ ikt , for different sib pairs. The longitudinal observations for pair k in IBD group i, denoted by ϖ ik = (ϖ ik1 ,...,ϖ ik6 ) T , can be arbitrarily dependent.
There are a total number of 6n possible observations where n = Σn i . The method allows for missing values [3].

Relative effect
In this model no distributional parameters, such as the mean, are specified. A nonparametric effect is defined by a contrast of the distribution functions where G is the average of all marginal distributions over IBD groups i and age groups t. This relative effect quantifies the relation of the marginal distribution F it with respect to G. If F it tends to the left of G (at a specific position G has smaller values than F it ) then p it < 0.5, and likewise for p it > 0.5. p it = 0.5 indicates no such tendency. The relationship p it <p i't' indicates that F it tends to smaller values than F i't' with respect to G. The consistent estimator of the relative effect is based on ranks.

Hypothesis for gene effect
Under H 0 there are no differences between IBD groups and thus no influence of the marker's IBD on the phenotypic quadratic SBP differences for sib pairs taking all longitudinal measurements into account. Rejection of H 0 implies differences between IBD groups, and thus supports an influence of the marker on SBP assuming that phenotypic similarity increases with higher IBD group.

Test statistic
To test the null hypothesis given above of no differences between IBD groups an ANOVA-like test statistic based on standardized squared differences of rank-means can be employed. This test statistic Q is asymptotically F-distributed with appropriate degrees of freedom f 1

Results
The samples resulting from the three selection strategies differ by the amount of missing observations and random selected sib pairs differ with respect to IBD information and the size of each IBD group. For each strategy and for all markers on chromosome 17 we tested for phenotypic differences between IBD groups. In order to reduce the number of missing values, pairs with less than three measurements in time were neglected. At 34.56 cM and 108.27 cM significance at 5% was reached for two strategies, with the highest peak at 34.56 cM (Figure 1). With multiple testing corrections no significant results are found. Although the p-value curves are similar across strategies, S LONGITUDINAL tends to smaller and S EQUAL to higher p-values. Figure 2 shows box plots for the SBP quadratic differences of sib pairs for each age and IBD group at 34.56 cM. In four age groups median and maximum value are highest in IBD group 0. In five age groups the median of IBD group 2 is smallest. Thus more phenotypic similarity corresponds to higher IBD groups indicating linkage. The distributions within groups are highly skewed. Variances are not equal. A nonparametric approach not assuming normality and variance homoscedasticity is warranted.

Discussion
On chromosome 17 markers with significant influence on SBP could not be identified. Previously linkage to the ACE gene [8] and to adjacent areas of chromosome 17 [6] using the Framingham study have been reported. The linkage to ACE [6] was based on a subgroup analysis for men only. The analysis of O'Donnell et al. [5] differed in the final data set of the FHS used, in the definition of the dependent variable, as well as of course in the method applied. We used the complete pedigree information for the calculation of multipoint IBD probabilities. Then we demonstrated our newly introduced method on a largely reduced subset of the data using well characterized independent sib pairs only. Currently this is a major limitation of this method if the data are not ascertained observing this design. We did not yet investigate whether and how the assumption of independence of the sib pairs can be  reported linkage on chromosome 17, but not to the ACE gene region (for a summary, see [9]).
The sample selection strategies, driven by phenotype or genotype, lead to approximately 40% missing observations. This is very high and could effect the results in general. In contrast to other GAW13 groups we do not impute missing values. The described approach can handle missing data.
Also, IBD probabilities do not differ much between strategies. Thus this should not be the main cause for differences in p-value. The size of IBD groups varies drastically. S LONGITUDINAL and S MAXPROB place approximately half of the pairs in IBD group 1; S EQUAL emphasizes IBD groups 0 and 2. This results in less significant p-values for S EQUAL than for the other strategies.
As seen in Figure 2, the quadratic SBP differences are not normally distributed and variances between IBD groups are not equal. Therefore a nonparametric approach was necessary. Our approach can also be applied to a diallelic marker with three genotypes in an association study instead of IBD groups in a linkage study. In this context the dependent variable can also be ordinal, such as a score. We can also test for an interaction effect between age group and IBD group, appropriate for a marker influence on SBP with age of onset. In the future, we will further investigate these approaches and their properties.

Conclusion
Our main aim was to introduce a new approach for longitudinal data, which explains the underlying example, the data of the Framingham study. We required independence of sib pairs, full genotyping in nuclear families including parents, and longitudinality SBP observations on each sib pair for at least two age categories. Thus, we could only use a small subset of the data, resulting in a loss of power. If planning a new study the design can explicitly be incorpo-rated to make optimal use of the data. Also the effects of relaxing the requirements above can be examined, such as full genotyping also in parents for multipoint IBD determination.