Strategy and model building in the fourth dimension: a null model for genotype × age interaction as a Gaussian stationary stochastic process

Background Using univariate and multivariate variance components linkage analysis methods, we studied possible genotype × age interaction in cardiovascular phenotypes related to the aging process from the Framingham Heart Study. Results We found evidence for genotype × age interaction for fasting glucose and systolic blood pressure. Conclusions There is polygenic genotype × age interaction for fasting glucose and systolic blood pressure and quantitative trait locus × age interaction for a linkage signal for systolic blood pressure phenotypes located on chromosome 17 at 67 cM.


Background
The Framingham Heart Study (FHS) [1] offers a unique opportunity to assess possible genotype × age (G × age) interaction in the genetic architecture of complex traits. To study G × age interaction in the FHS, we built on the variance components model [2], which, assuming negligible dominance, may be given as: where Σ is the variance-covariance matrix of a pedigree, is a matrix of estimated elements, , giving the proportion of alleles identical by descent (IBD) in individuals i and j at a quantitative trait locus (QTL), Φ is the kinship matrix of the pedigree, I is the identity matrix, and , , and are variances of the additive QTL, polygenic and environmental factors, respectively.
Our study was carried out in two phases. In Phase 1, we made inferences on polygenic G × age interaction using what we call the G × age model [3]. Also, we experiment with correlation functions in the G × age model. In Phase 2, we implement the QTL version of this model. Multivariate variance components linkage analysis [4] can also be used to detect G × age interaction. We show how the bivariate variance components model can be used to detect QTL × age interaction. Given that the G × age and bivariate

Models and Methods
Using the bivariate mixed model [5], Blangero [6] derived the following expression for the genetic variance in phenotype response to two different environments: where and are the additive genetic variances of the trait in Environments 1 and 2, in this case two ages, is the additive genetic variance of the phenotypic difference between environments (i.e., trait response), and ρ G is the genetic correlation between environments, in this case ages. The term environment, denoted by E, is used in accordance with standard statistical genetic theory [6].
There is no G × E interaction (i.e., = 0) if = = and ρ G = 1 [6]. To model = 0, and ρ G are parameterized as functions of age, which is the environment of interest in our analyses: where is parameterized as an exponential function [7] and ρ G as exponential decay [8], and α, β, and λ are parameters to be estimated, and m and n are two ages. Thus, the null hypotheses are β = 0 and λ = 0, which reflect a stationary covariance function.
In our analyses, we used the expected covariance matrix for a given pedigree [3], which has elements giving the covariance in trait value y for any two relatives: where x and z index individuals at ages m and n, respectively, Φ xz gives their kinship status, π is the probability that a random allele is IBD at the QTL, δ xz is 1 for x = z and 0 otherwise, subscripts g, q, and e denote the polygenic, QTL, and environmental components, respectively, and average age is taken over the sample population, defined as all individuals measured for the trait of interest. Some points of clarification are needed. This is a cross-sectional model that applies generally to three types of pair-wise comparisons of individuals. In one type, let x = z such that m = n. Equation (5) gives the variances in this situation, in accord with the standard variance components model. In a second type, it may be such that x ≠ z while m ≠ n, and, in a third type, it may be such that x ≠ z while m ≠ n. Note that none of these types are longitudinal comparisons, which would be the case where x = z while m ≠ n (i.e., the same individual is measured at different ages). The goal of this approach lies in estimation of the change parameters, namely β and λ, so that we can test the null hypotheses stated above.
For the bivariate model, a trait measured at two different time points is treated as a bivariate phenotype. The pedigree covariance matrix model for a bivariate phenotype K, with constituent phenotypes I and J, may be written as [4]: where the new matrices G, Q, and E convey the polygenic, QTL, and environmental variance components, respectively, and is the Kronecker-product operator. The variance components for this model are those for the univariate model for traits I and J, given by Σ I and Σ J (cf. equation (1)), and the trait cross-covariances, which may be parameterized as ρ gij σ gi σ gj , ρ qij σ qi σ qj , and ρ eij σ ei σ ej for the polygenic, QTL, and environmental components, respectively [4]. The null hypotheses of no polygenic genotype or QTL × age interaction are expressed as ρ gij = 1 and ρ qij = 1, respectively.
Scatter plots were examined for traits showing increasing or decreasing variance in trait values with age, suggesting potential for G × age interaction. Traits meeting these criteria, namely systolic blood pressure (SBP) and fasting glucose (GLUC), were analyzed using SOLAR [2], with age, sex, hypertension medication, and body mass index (BMI) as covariates. Phase 1 analyses were conducted on an augmented data set. Exams 12 (1970), 16 (1978), 18 (1982), and 20 (1986) in Cohort 1 were combined with Exams 1 (1971), 2 (1979), 3 (1983), and 4 (1987) in Cohort 2, respectively. To reduce kurtosis after combining, a few outliers were removed for SBP (1-6 per exam, all with values > the mean) but many for GLUC (51-101 per exam, all but one > the mean). No data transformation was made. Since diabetic status was not available, we could not control for it. This gave combined measurement periods 1-4.
Based on genome scans that we conducted for both SBP and GLUC, we decided to focus on SBP. For Phase 2, data σ σ σ ρ σ σ σ σ σ σ ρ σ   for SBP values corrected for hypertension treatment for Cohorts 1 and 2, imputed following Levy et al. [9] (see Soler and Blangero [10]), were analyzed. Bivariate analyses were performed on residuals from multiple regressions with corrected SBP values as the dependent variable and age, sex and BMI as independent variables. For the bivariate analyses, analysis of residuals was necessary due to the unbalanced and incomplete structure of the FHS longitudinal data, which made covariate specification unrepresentative across measurements. For this SBP data set (corrected values and residuals), we experimented with combined exams from Cohorts 1 and 2 and with exams from Cohort 2 taken separately.

Phase 1
We implemented equation (5) using three different correlation functions, specifically the standard normal, Cauchy, and exponential. These gave consistent results (not shown). Only selected results for the exponential function given by equation (4) are reported. For SBP, only combined measurement period 3 gave strong evidence for polygenic G × age interaction (Fig. 1). For GLUC, combined measurement periods 2-4 all gave strong evidence for polygenic G × age interaction and we believe combined measurement period 3 to be representative (Fig. 1).
The p-values reported in Figure 1 and Tables 1,2 are for the likelihood ratio statistic: Λ = -2 [ln L(θ N )-ln L(θ A )], where the null, H N (parameter constrained to 0 or 1 as appropriate), is compared to the alternative, H A (parameter estimated). In general, Λ is distributed as a χ 2 random variable with degrees of freedom (d.f.) equal to the difference in the number of parameters under the null and alternative hypotheses [11]. If parameter values are constrained to a boundary under the null hypothesis, the asymptotic distribution of Λ is given by a mixture of random variables, where n denotes the d.f. and where the mixture may include n = 0 (point mass at 0) [11]. However, the traditional criterion (i.e., difference in parameters) is conservative [3].
Genetic parameters, combined measurement period 3

Phase 2
We recovered the linkage signal for SBP on chromosome 17 at 67 cM reported by the FHS group [9] (Fig. 2; all maximum LOD scores > 3). We then applied the G × age and bivariate models in search of QTL × age interaction. Table  1 contains model selection criteria [12] and selected likelihood ratio tests for univariate analyses of corrected SBP values using the G × age model for combined Cohorts 1 and 2, Exams 20 and 4, respectively. Table 2 contains similar information for bivariate analyses of corrected SBP residuals using the bivariate model for Cohort 2, Exams 4 and 5.

Conclusion
Our results suggest that there is polygenic G × age interaction for both GLUC and SBP ( Fig. 1; Tables 1,2) and QTL × age interaction for the putative QTL for SBP phenotypes on chromosome 17 at 67 cM (linkage: Fig. 2; interaction: Table 1). To our knowledge, this is the first demonstration of QTL × age interaction in humans using linkage analysis methods.
The cross-sectional and longitudinal analyses do not give the same results. The difference may mean that cross-sec-tional data are better than longitudinal data at capturing G × age interaction. However, based on the received wisdom regarding the relative utility of cross-sectional and longitudinal analyses [13], this explanation does not seem tenable. Alternatively, that the bivariate model did not detect QTL × age interaction (Table 2) may simply be due to the loss in power with an overly parameterized model [12]. Another explanation is that the bivariate model is really operating on time rather than on age and so perhaps the brief time span between Cohort 2, Exams 4 and 5, was too short to capture an interaction effect. Yet another explanation of the difference between the crosssectional and longitudinal analyses is that the latter was carried out on Cohort 2 only, which, taken by itself, yielded a smaller sample size. Lastly, there is the possibility of some mixture of the last three problems.
Given that equations (5) and (6) derive from the same underlying modeling framework -namely variance components -they can be conceptualized as a strategy for testing the null hypothesis that a given phenotype "translated along the time axis" is a Gaussian covariance stationary stochastic process [14]. This is now an established approach in statistical genetics [14], and is readily Akaike's Information Criterion = -2 Ln L (θ| data) + 2K, where θ is a parameter vector and K is the number of parameters. C Evidence Ratio: w min /w i = exp(∆ i /2), where w min is set to 1, w i = [exp(-∆ i /2)]/Σ r = 1 exp(-∆ r /2) and ∆ i = AIC i -AIC min . D Models compared. extended to the genetics of aging. It is our hope that the analyses herein contribute to a better understanding of the genetic architecture of the complex traits associated with the aging process and associated complex diseases (e.g., cardiovascular disease). We conclude that the G × age and bivariate models offer a feasible system for model building in the fourth dimension.