We present an alternative approach to deriving a phenotype from longitudinal data based on the GEE methodology, which accounts for the repeated measures from each observation. We hypothesize that our approach would provide more linkage information than that of Levy et al. because our approach uses all of the data to estimate the parameter estimates in the regression model. The approach of Levy et al. averages the longitudinal data first, thereby reducing the variability of the data, and then models the summary statistics in a regression analysis. What we observed, however, was that the approaches provided essentially the same amount of genetic information based on how similar the LOD scores are across the genome and how correlated the two phenotypes were. The two regions that had LOD scores of about 1.5 or higher occurred on chromosomes 5 and 17. At both regions, Method 1 had a LOD score that was about 0.55 units greater than that of Method 2. For the other regions, the results were similar or inconclusive, such as that found on chromosome 22. Since the maximum LOD score was about 1 on chromosome 22, no conclusions can be made about the difference between the methods in this region. This is because of the strong potential of false positives with such a low LOD score. The overall similarity between the methods indicates that little or no loss of information occurs by reducing the multiple measurements from each person to a single measurement before adjusting for other covariates.

A limitation of our study is that we use the real data to compare the two statistical methods. A more accurate comparison should be made with the use of simulated data, in which the true gene locations are known*a priori*. With simulated data, empirical type I error rates and power can then be determined for both methods, but a simulation study such as this requires at least 1000 replicates to appropriately test at the 5% significance level; the simulated data from GAW had only 100 replicates. We suspect that the conclusions drawn from the use of the simulated data (with 100 replicates) would not have been much more accurate than what we observed from the used of the real data. Another limitation of our study is that our approach (Method 2) did not account for familial relationships in the analysis. An assumption of GEE is independence among subjects, and a violation of this assumption may bias parameter estimates. We do observe a difference in parameter estimates between Cohorts 1 and II. This difference could be due to the fact that Cohort 2 consists of related subject, due to the fact that more data are available in Cohort 1, or it even could be due to ascertainment differences between the cohorts. However, based on Figure 1, a violation of this independence assumption of GEE does not appear to be a problem because both Method 1 and 2 had essentially the same LOD scores across the genome.

We note that our results from Method 1 differed from the published results of Levy et al. Even though our Method 1 was similar to that of Levy et al., our analysis method was not as extensive as theirs, e.g., Levy et al. accounted for treatment effects of hypertension, and they had inclusion criteria specifying which subjects to include in the analysis. In contrast, we included all subjects who had both phenotype and genotype information. Moreover, the Levy et al. analysis had two additional pedigrees. Thus, we expected the two results to differ. However, we emphasize that the primary goal here was to compare two analytical approaches rather than replicate the findings of Levy et al. In summary, since the two approaches provided similar results, we conclude that Method 1 is the more parsimonious approach to use because it requires fewer assumptions in the data analysis.