Consistency of linkage results across exams and methods in the Framingham Heart Study

Background The repeated measures in the Framingham Heart Study in the Genetic Analysis Workshop 13 data set allow us to test for consistency of linkage results within a study across time. We compared regression-based linkage to variance components linkage across time for six quantitative traits in the real data. Results The variance components approach found 11 significant linkages, the regression-based approach found 4. There was only one region that overlapped. Consistency between exams generally decreased as the time interval between exams increased. The regression-based approach showed higher consistency in linkage results across exams. Conclusion The low consistency between exams and between methods may help explain the lack of replication between studies in this field.


Background
The general lack of replication of genome scan results across data sets is an ongoing concern in statistical genetics [1]. Consistency (or lack thereof) within a single study with longitudinal data could provide insight into the problem of replication. The repeated measures in the Framingham Heart Study in the Genetic Analysis Workshop 13 (GAW13) data set allow us to test for consistency of linkage results within a study across time. We believe that advances in simulation have not reached the point that they adequately capture the underlying complexity that contributes strongly to consistency issues. We therefore chose to use the real data.
A new regression-based method of linkage analysis that works on arbitrary pedigrees has recently been published [2] and implemented in the publicly available MERLIN software package [3]. This method promises to be robust with respect to departures from normality, a problem that has plagued variance components linkage analysis [4]. We used MERLIN to perform linkage analysis using both variance components (VC) and regression-based linkage analysis on seven quantitative traits in the real data from the Framingham Heart Study. For six of these traits, multiple observations (three or four for each trait) were available across time for all family participants. We performed linkage analysis using both methods on each of these time points. Thus, we performed 23 genome scans using each method (46 total) in MERLIN. This allowed us to compare results across methods and across exams. To validate the MERLIN VC we also performed genome scans on the 23 quantitative traits using GENEHUNTER [5] to compare the VC implementations.

Methods
The traits we analyzed were body mass index (BMI), total cholesterol (CHOL), glucose (GLU), high-density lipoprotein cholesterol (HDL), height (HGT), systolic blood pressure (SBP), and log-transformed triglycerides (LGTG). The effects of sex, age, age 2 , and cohort status were included as covariates in all genome scans. In order to test the robustness of regression-based linkage, we did not transform any of the variables to achieve normality (except triglycerides). To maintain as much consistency across data sets as possible, we did not correct for hypertension, diabetes, or any medication.
The 330 families were trimmed using the PEDSYS PEDTRIM program. This program removes individuals from a pedigree who do not contribute any phenotypic or genotypic information. Using the criterion of at least one marker on chromosome 22, PEDTRIM reduced the family set from 4692 individuals in 330 families to 2604 individuals in 334 families (trimming caused four families to each be split into two).
The complexity of a family is measured by the formula 2N-F, where N is the number of nonfounders and F is the number of founders. Our prior experience with GENE-HUNTER indicates that a practical upper limit for the complexity of a family is 21. After trimming, there were still nine families with complexity greater than 21 (range 23-46). These nine families were cut into 19 separate pedigrees, resulting in 2588 individuals in 344 families. All analyses described in this report used these trimmed and cut family structures.
We constructed four data sets (denoted A-D) by filling out the families with data corresponding to the first four Cohort 2 exams and include individuals from Cohort 1 exams that overlapped in time with the four Cohort 2 exams, thus yielding four data sets, each with paired contemporaneous data for Cohorts 1 and 2. These four "crosssectional" data sets represent four time points spanning approximately 16 years. For data set A, the variables were taken from Cohort 2 Exam 1 and Cohort 1 Exam 12, depending on the cohort of the individual. Cohort 2 Exam 1 and Cohort 1 Exam 12 overlap in time (1970)(1971) and are thus the natural choice to achieve a cross-sectional data set of these families. Similarly, for data sets B through D, we combined Cohort 2 Exam 2 and Cohort 1 Exam 16 (1978)(1979), Cohort 2 Exam 3 and Cohort 1 Exam 18 (1982)(1983), and Cohort 2 Exam 4 and Cohort 1 Exam 20 (1986)(1987), respectively. There were several exceptions to this scheme. There were no Cohort 1 Exams 16 or 18 for cholesterol and HDL, so data set C was not constructed for either trait and data set B used data from Cohort 1 Exam 15 rather than 16. Similarly, height was not available from Cohort 1 Exam 12, so both height and BMI data sets A were constructed using data from Cohort 1 Exam 10.
We performed VC linkage analysis on all 23 traits using GENEHUNTER and MERLIN. Both programs computed multipoint identity-by-descent (IBD) probabilities at each marker and at four equally spaced loci between each pair of markers. We also performed regression-based linkage analysis [2] on all 23 traits using MERLIN.

Results
MERLIN computed IBD probabilities much faster than GENEHUNTER (approximately 4.5 times faster on our Pentium 4 computer). A full genome scan (including IBD calculations) on MERLIN took approximately 4.5 hours for VC linkage and 41 hours for regression-based linkage. In addition, MERLIN was limited to markers with less than 32 alleles (on our 32-bit system). One marker had 39 alleles; it was downcoded to 30 alleles. Table 1 shows basic descriptive statistics for all the quantitative traits over all available data sets. The last letter in each trait name refers to the particular data set (A-D). The Framingham Heart Study can be considered a populationbased sample, so the mean, variance (SD 2 ), and heritability in this table were used as input to the regression-based linkage analyses. Heritability was computed by MERLIN as the ratio of additive polygenic variance to the total variance. Table 2 shows all maximum LOD scores greater than 1.44 (p < 0.005) for VC and regression. For every LOD score greater than 1.44, the LOD score at the same point using the other method is included for comparison. For each trait, the LOD scores are ordered by chromosome, centimorgan location, and data set, in that order. There were 15 distinct regions for which one or the other method found a LOD score greater than 3.0 in at least one exam. VC analysis found 11 regions, regression found 4 regions, and there was one region in which both methods found significant linkage (chromosome 7 for cholesterol). Of the 11 significant regions from VC, 8 had some supporting evidence for linkage (LOD score > 1.0) from regression. Of the 4 significant regions from regression, 2 showed supporting evidence from VC. Perhaps surprisingly, of the 5 significant regions for which the other method did not provide some support, 4 were for height. The last line of Table 2 shows for each trait the LOD score correlation between the two methods. The correlation calculation used all LOD scores over all loci on every chromosome and all available data sets (there were 1917 LOD scores in each genome scan).
VC LOD scores from GENEHUNTER were compared to VC results from MERLIN. For all traits examined the correlation between LOD scores from the two programs was at least 0.993. For 18 of the 23 traits the correlation was 0.9999. All p-values were highly significant (p < 0.0001).
The glucose A data set, which had the highest kurtosis, was a special problem for regression-based linkage. The program gave no LOD score at all between 180 and 190 cM on chromosome 4, and it gave a LOD score of 12.5 at 1 cM on chromosome 3. In both cases, it gave heritabilities greater than 1. For these reasons, we exclude the glucose A data set from further consideration. In addition, results from MERLIN for weight were obviously incorrect (heritabilities of 1.0 and all LOD scores were 0.0), so we do not report any results for weight. The reasons for this anomaly are currently unknown to us. Table 3 shows the correlation between pairs of traits for all traits analyzed. The results conform to prior expectation; the highest correlations are within those traits that have the smallest measurement error (height and BMI). Also, the lowest correlations are between the two data sets (A and D) that are the farthest apart in time. Table 4 shows the MERLIN VC LOD score correlations between all possible pairs of data sets. For cholesterol and HDL, data set C was missing, so correlations are not available for pairs involving data set C. In general, the highest correlations were for BMI and height and the lowest correlations were for glucose and SBP. In addition, there was a crudely inverse relationship between time interval and correlation; i.e., higher correlations were observed when the time interval between exams was smaller (B-C and C-D) and lower correlations were observed when the time interval was larger (A-D). Table 5 shows the regression-based LOD score correlations between all possible pairs of data sets. The overall pattern among traits is very similar to VC, i.e., higher correlations for BMI and height and lower correlations for glucose and SBP. However, the exam-to-exam consistency is much higher for regression-based linkage, often twice as high.

Discussion
Two obvious differences between the two methods are the number of significant linkage regions and the improved consistency in LOD scores between data sets provided by the regression-based approach as opposed to the variance components approach. VC analysis found almost three times as many significant regions as regression-based linkage. This may be due to the well known inflation of LOD scores due to non-normality, and indeed two of the significant VC linkages were detected in glucose, which has high kurtosis. However, the other eight significant VC linkages are in traits with low kurtosis (see Table 1). This may indicate that regression-based linkage is conservative. The higher consistency between data sets from the regression-  based approach supports the previously reported robustness of this method [2]. It should be noted, however, that the consistency for some traits (e.g., SBP) remains low.
Glucose, which has high kurtosis, is an especially interesting test of the regression method. The VC LOD scores do not appear to be excessive, however, the regression LOD scores are generally lower, overall. The much weaker correlations between exams from VC analysis reconfirms again that VC is sensitive to non-normality. Indeed, given the low trait correlations for glucose (see Table 3), the consistency of the regression method is quite good.
The lack of consistency between exams is somewhat disappointing. Table 2 shows only four regions in which at least three of the four exams had LOD scores greater than 1.44. With the possible exception of height, Tables 4 and  5 confirm the poor consistency between exams. However, the consistency between exams may also be affected by the varying sample size from exam to exam. In addition, the individuals dropping out of the study due to death are more likely to be older. The effect of sample size and dropout bias needs to be investigated further.
The consistency measure used here, correlation between LOD scores, is problematic. The distribution of LOD scores from any two exams is clearly not distributed as a bivariate normal. However, the pattern observed across time, lower correlations across longer time periods, which corresponds to expectations, somewhat ameliorates this concern. Clearly, more work is needed to develop a valid measure of consistency.
The height results are puzzling. Each method found two significant linkages, but neither method gave any support to the significant linkages found by the other method. A closer examination of the height results reveals that this inverse relationship exists for the suggestive linkages as well. The LOD score correlation between methods for height (0.21) is less than half the next lowest (0.51 for SBP). Height is unique in this study in at least two ways, much higher heritability than the other traits and the assortative mating for height in humans. However, it is not clear why either of the two methods should be sensitive to these unique aspects of height. In addition, kurtosis for height is negligible and the highest LOD score from both methods have some support from previously reported genome scans (e.g., Hirschhorn et al. [6]), thus there seems to be little reason to prefer one method to the other.