A quantitative linkage score for an association study following a linkage analysis
- Tao Wang^{1} and
- Robert C Elston^{1}Email author
https://doi.org/10.1186/1471-2156-7-5
© Wang and Elston; licensee BioMed Central Ltd. 2006
Received: 04 August 2005
Accepted: 20 January 2006
Published: 20 January 2006
Abstract
Background:
Currently, a commonly used strategy for mapping complex quantitative traits is to use a genome-wide linkage analysis to narrow suspected genes to regions on a scale of centiMorgans (cM), followed by an association analysis to fine map the genetic variation in regions showing linkage. Two important questions arise in the design and the resulting inference at the association stage of this sequential procedure: (1) how should we design an efficient association study given the information provided by the previous linkage study? and (2) can an association in a linkage region explain, in part, the detected linkage signal?
Results:
We derive a quantitative linkage score (QLS) based on Haseman-Elston regression (Haseman and Elston 1972) and make use of this score to address both questions. In designing an association study, the selection of a subsample from the linkage study sample can be guided by the linkage information summarized in the QLS. When heterogeneity exists, we show that selection based on the QLS can increase the proportion of sample individuals from the subpopulation affected by a disease allele and therefore greatly improves the power of the association study. For the resulting inference, we frame as a hypothesis test the question of whether a linkage signal in a region can be in part explained by a marker allele. A simple one sided paired t-statistic is defined by comparing the two sets of QLSs obtained with/without modeling a marker association: a significant difference indicates that the marker can at least partly account for the detected linkage. We also show that this statistic can be used to detect a spurious association.
Conclusion:
All our results suggest that a careful examination of QLSs should be helpful for understanding the results of both association and linkage studies.
Background
Identifying genes underlying complex quantitative traits, which are often heterogeneous and multifactorial, is still a great challenge in genetic epidemiology studies. Currently, a commonly used strategy for mapping complex traits is to use a genome-wide linkage analysis to narrow suspected genes to regions on a scale of centiMorgans (cM), followed by an association analysis to fine map the genetic variation in regions showing linkage. At the association stage of this sequential process, we are often interested in two questions: (1) how should we design a powerful and efficient association study given the information provided by the previous linkage study? and (2) can an association in a linkage region explain, in part, the detected linkage signal? Although these questions that arise respectively at the design and inference stages are two quite different aspects of an association study, they are related because both questions essentially rely on the interdependence of linkage and association. Here, we derive a quantitative linkage score (QLS) from Haseman-Elston linkage regression [1] and make use of this score to address both questions in the scenario of analyzing a complex quantitative trait.
The loci predisposing to a complex quantitative trait are usually expected to have small effects. One important reason for this, among others, is heterogeneity of the phenotype, where an allele of interest may have no effect on some individuals because they have different genetic and environmental backgrounds. If these individuals are included in the sample used in the association study, the effect of the examined allele is "diluted" and this leads to great difficulty in detecting association. Careful selection of individuals from the sample to exclude such possible "dilution" should presumably provide greater power. Ideally, we should like to find a variable, such as age, sex or ethnicity, that indicates heterogeneous persons. Unfortunately, such an indicator variable is often unclear or unavailable for a complex trait. Nevertheless, if an association study follows a linkage study, selection of the sample for the association study may be guided by the linkage information already obtained, using the linkage signal as a natural heterogeneity indicator. This idea has long been recognized and implemented in practice [2–4]. Fingerlin et al. (2004) systematically examined the selection of cases for a case-control association study based on allele-sharing information provided by affected members of a family [5]. We focus here on sample selection for an association study of a quantitative trait and show the usefulness of the QLS when heterogeneity exists.
After an association has been detected between the trait and a marker allele in the region of linkage, the question of whether this association accounts, in part, for the previously found linkage signal is not trivial. If the allele statistically associated with the trait is partly responsible for the linkage, we may be more confident that this allele is itself functional or in linkage disequilibrium with the true functional variant, rather than a false discovery resulting from other causes. On the other hand, if the associated allele cannot explain any linkage signal, we may consider adding more association markers to the region in order to avoid missing a possible genetic variant affecting the trait of interest. In the case of affected sibs (or other affected relatives) used for linkage analysis, one approach is to examine the difference in the allele sharing identical by descent (IBD) between members of families selected on the basis of the associated marker [2, 6]. We address this question for a quantitative trait by testing whether there is a significant difference between the QLS with and without including this marker in the model. We show that this test is essentially the same as examining the interaction between the linkage and association signals and therefore is related to the genotype-IBD sharing test (GIST) proposed by Li et al. (2004) for affected sibship data[6]. Fulker (1999) proposed a similar idea, in the context of a variance component model, simultaneously modeling the association and linkage in the mean and variance-covariance structure of a family [7]. They focused on testing a similar, but different, hypothesis to determine whether the allele is the true candidate or is merely in disequilibrium with the trait locus, by comparing a model with all the parameters freely estimated to a model in which the linked genetic variance of the quantitative trait locus (QTL) is set to zero, on the assumption that there is a single variant responsible for the linkage signal[8].
In this paper, we propose a linkage score derived from quantitative trait linkage analysis that has important applications when an association study follows a linkage analysis. Although the linkage score derived here can be easily extended to general families, to implement our approach we focus here on nuclear families. We first derive the linkage score in the method section. Then we perform computer simulations to examine the usefulness of this score to select a sample for an association study when heterogeneity exists, and to clarify whether the association can, at least in part, explain the linkage signal.
Methods
Our goal is to derive a score that captures the linkage information for quantitative traits in a way that will be useful for a follow-up association study. For simplicity of presentation, we assume the quantitative trait value may be affected by the presence of an allele without any other covariates present, which is not a necessary limitation for our derivation. We suppose linkage markers have been genotyped for family members and therefore the proportion of alleles shared IBD at a particular location can be estimated for all pairs of relatives in a pedigree [9, 10].
Quantitative linkage score (QLS)
We first derive the QLS. Suppose we have recruited N sibships. The trait value y_{ ik }of sib i(1, ..., n_{ k }) in sibship k(1, ..., N) is modeled by
y_{ ik }= μ_{ k }+ x_{ ik }b +e_{ ik }, (1)
where μ_{ k }is the sibship specific mean, which absorbs family-level effects such as polygenic and common environmental effects [11]; b is the effect of the quantitative trait locus (QTL), which may include both additive and dominant effects; x_{ ik }is the corresponding vector of design variables indicating the genotype of the QTL; and e_{ ik }is an individual-level random effect. For simplicity of exposition only, we assume the QTL effect is additive and therefore x_{ ik }can be coded as one variable to indicate the number of copies of the allele of interest. Otherwise, it can be coded as a vector with two elements, for additive and dominant effects, respectively. Because in a linkage analysis the genotype of a QTL (x_{ ik }) is not observed (or the marker cannot be assumed to be in linkage disequilibrium with the QTL), we are not able to estimate directly. However, we can model the QTL effect in the variance-covariance matrix at the family-level. Under the trait model (1), the variance-covariance matrix of sibship k is given by
$E\left(\begin{array}{ccc}({y}_{1k}-{\mu}_{k})({y}_{1k}-{\mu}_{k})& \dots & ({y}_{1k}-{\mu}_{k})({y}_{{n}_{k}k}-{\mu}_{k})\\ \dots & \dots & \dots \\ ({y}_{1k}-{\mu}_{k})({y}_{{n}_{k}k}-{\mu}_{k})& \dots & ({y}_{{n}_{k}k}-{\mu}_{k})({y}_{{n}_{k}k}-{\mu}_{k})\end{array}\right)=\left(\begin{array}{ccc}{\sigma}_{b}^{2}+{\sigma}_{e}^{2}& \dots & IB{D}_{1{n}_{k}k}{\sigma}_{b}^{2}\\ \dots & \dots & \dots \\ IB{D}_{1{n}_{k}k}{\sigma}_{b}^{2}& \dots & {\sigma}_{b}^{2}+{\sigma}_{e}^{2}\end{array}\right),$
where ${\sigma}_{b}^{2}$ is the variance of the QTL, ${\sigma}_{e}^{2}$ is the individual random effect variance and IBD_{ ijk }is the proportion of marker alleles shared IBD by sibs i and j in family k. Because both matrices are symmetric and the diagonal elements do not include linkage information, we only consider the lower triangular elements. We rearrange these elements of the above matrices as vectors of length n_{ k }(n_{ k }- 1)/2 by stacking one column on top of the other and then have
$E\left(\begin{array}{c}({y}_{1k}-{\mu}_{k})({y}_{2k}-{\mu}_{k})\\ \dots \\ ({y}_{ik}-{\mu}_{k})({y}_{jk}-{\mu}_{k})\\ \mathrm{..}\\ ({y}_{{(n-1)}_{k}k}-{\mu}_{k})({y}_{{n}_{k}k}-{\mu}_{k}\end{array}\right)=\left(\begin{array}{c}{\sigma}_{b}^{2}IB{D}_{12k}\\ \dots \\ {\sigma}_{b}^{2}IB{D}_{ijk}\\ \dots \\ {\sigma}_{b}^{2}IB{D}_{(n-1)nk}\end{array}\right).\phantom{\rule{0.1em}{0ex}}\left(2\right)$
We can treat the above equation as a version of Haseman-Elston (HE) regression. The sibship specific mean μ_{ k }is usually unknown and needs to be estimated; various estimates have been discussed and a shrinkage estimate ${\widehat{\mu}}_{k}$ has been recommended [11, 12]. For the simulations performed in this paper, the ${\widehat{\mu}}_{k}$ was estimated by the function lme in the R package http://cran.us.r-project.org. In a HE regression, linkage is detected by testing whether the QLT variance ${\sigma}_{b}^{2}$ > 0, which is equivalent to testing the correlation between IBD_{ ijk }and the trait similarity between the two sibs, as measured by (y_{ ik }- ${\widehat{\mu}}_{k}$)(y_{ jk }- ${\widehat{\mu}}_{k}$) in our case. From this perspective, the linkage information provided by a sibpair can be captured by the score
${U}_{ijk}=({y}_{ik}-{\widehat{\mu}}_{k})({y}_{jk}-{\widehat{\mu}}_{k})(IB{D}_{ijk}-0.5)\phantom{\rule{0.1em}{0ex}}\left(3\right)$
From equation (3), we can see that for an additive trait model a positive score supports linkage and a negative score is evidence against linkage. When the inheritance model is unclear, we may take the "minmax" method to estimate the proportion of marker alleles shared IBD for a full sibpair, i.e. IBD_{ ijk }= 0.275f_{ijk 1}+ f_{ijk 2}instead of IBD_{ ijk }= 0.5f_{ijk 1}+ f_{ijk 2}, where f_{ijk 1}and f_{ijk 2}are probabilities of 1 and 2 alleles shared IBD, respectively [13]. We can simply sum the scores for all the pairs in a sibship to obtain a measure of linkage evidence for this sibship, because the sibship mean absorbs any residual correlation among the sibs. We may define the QLS more generally as U_{ ijk }= S_{ ijk }(IBD_{ ijk }- 0.5), where S_{ ijk }can be any measure of trait similarity, for example the squared sibpair difference, or a weighted average of the squared (mean-corrected) sum and the squared difference in trait values of two sibs, all of which are provided by different versions of HE regression implemented in the software SIBPAL of S.A.G.E. (2004). Different measures of trait similarity have been discussed in detail in the literature [e.g. [11, 14–16]]. In those cases we may need to consider, in order to sum the QLSs within a sibship, a weight function appropriate for the correlation between scores among sibpairs. Note there is no difficulty in extending the QLS to qualitative traits. For example, for affected sibpairs S_{ ijk }can be defined as 1 for all pairs and the linkage score is simply given by U_{ ijk }= (IBD_{ ijk }- 0.5), which is related to the NPL score [17] and the statistic of the mean test [18].
Application of the QLS in selecting a sample for an association study
We consider selecting a set of unrelated individuals from sibships previously used for a QTL linkage analysis. In the case of a complex quantitative trait where heterogeneity exists, the goal of an association study is to detect a variant with maximum power. We emphasize that such a study would not be a classic epidemiologlcal study done to determine the attributable risk, for which subjects should be drawn randomly from a population. Rather, the study we discuss here is done for gene finding and therefore the selection of the sample should be done to provide maximum power rather than to represent the whole population.
Suppose that a population consists of two subpopulations (P1 and P2) with proportions q_{1} and q_{2} respectively (q_{1} + q_{2} = 1), where the gene variant has an effect in only one subpopulation (P1). To examine the usefulness of the QLS in selecting a sample for an association study, we theoretically compare the proportions of individuals affected by a disease allele selected from a homogenous subpopulation (P1) in two selected samples: one sample is obtained by randomly selecting sibships (proportion q_{ r }) and the other is obtained by selecting sibships with QLS>0 (proportion q_{ qls }). To simplify the theoretical derivation, we assume known IBD sharing and sibships of size 2 (independent sibpairs).
Let T_{ k }= [(y_{1k}- μ_{ k }), (y_{2k}- μ_{ k })]^{ T }, where the superscript T denotes transpose and the subscripts 1 and 2 indicate two sibs in a sibship. With the assumption of normal individual effects e_{ ik }, T ~ N(0, Σ_{ k }), where
${\sum}_{k}=\left[\begin{array}{cc}({\sigma}_{b}^{2}+{\sigma}_{e}^{2})& IB{D}_{12k}{\sigma}_{b}^{2}\\ IB{D}_{12k}{\sigma}_{b}^{2}& ({\sigma}_{b}^{2}+{\sigma}_{e}^{2})\end{array}\right]}.$
To further simplify the presentation, we standardize T_{ k }as Z_{ k }, so that the correlation matrix of Z_{ k }is
$\left(\begin{array}{cc}1& {\rho}_{k}\\ {\rho}_{k}& 1\end{array}\right)$
where ρ_{ k }= 0, 0.5${\sigma}_{b}^{2}$/(${\sigma}_{b}^{2}$ + ${\sigma}_{e}^{2}$) and ${\sigma}_{b}^{2}$/(${\sigma}_{b}^{2}$ + ${\sigma}_{e}^{2}$), respectively, for proportions 0, 0.5, and 1 allele sharing IBD. With the assumption that a random sample of sibpairs is used for the linkage analysis, we have q_{ r }= q_{1} and
${q}_{qls}=\frac{{q}_{1}}{{q}_{1}+\frac{{q}_{2}}{\left[{\scriptscriptstyle \frac{1}{\pi}}\text{arctan}\left({\scriptscriptstyle \frac{{\rho}_{IBD=1}^{2}}{1-{\rho}_{IBD=1}^{2}}}\right)+1\right]}},\phantom{\rule{0.1em}{0ex}}\left(4\right)$
Application of the QLS to assess the correlation of association with previous linkage
To answer the question of whether a linkage signal in a region can be in part explained by a marker allele used in an association study, we compare the QLS on incorporating and not incorporating this marker into the trait model (equation 1), which we call the first (or individual) level regression, to distinguish it from the second (or family) level regression (equation 2). We frame this problem as a hypothesis test. When a marker is included in the model at the individual level, the variance-covariance matrix of sibship k is given by
$\begin{array}{c}E\left(\begin{array}{ccc}({y}_{1k}-{\mu}_{k}-{x}_{1k}b)({y}_{1k}-{\mu}_{k}-{x}_{1k}b)& \dots & ({y}_{1k}-{\mu}_{k}-{x}_{1k}b)({y}_{{n}_{k}k}-{\mu}_{k}-{x}_{{n}_{k}k}b)\\ \cdots & \cdots & \cdots \\ ({y}_{1k}-{\mu}_{k}-{x}_{1k}b)({y}_{{n}_{k}k}-{\mu}_{k}-{x}_{{n}_{k}k}b)& \cdots & ({y}_{{n}_{k}k}-{\mu}_{k}-{x}_{{n}_{k}k}b)({y}_{{n}_{k}k}-{\mu}_{k}-{x}_{{n}_{k}k}b)\end{array}\right)\\ =\left(\begin{array}{ccc}{\sigma}_{b}^{2}+{\sigma}_{e}^{2}& \dots & IB{D}_{1{n}_{k}k}{\sigma}_{b}^{2}\\ \dots & \dots & \dots \\ IB{D}_{1{n}_{k}k}{\sigma}_{b}^{2}& \dots & {\sigma}_{b}^{2}+{\sigma}_{e}^{2}\end{array}\right)\end{array}$
where x_{ ik }is a genotype code for the marker and b is its effect on the trait, which may arise from a "true" association (the marker is the QTL itself or is in linkage disequilibrium with the QTL), or from a "spurious" association (e.g. due to population stratification). Based on the above equation, we can obtain the corresponding QLS with the marker included in the above regression model, which is given by
${U}_{ijk}=({y}_{ik}-{\widehat{\mu}}_{k}-{x}_{ik}\widehat{b})({y}_{jk}-{\widehat{\mu}}_{k}-{x}_{jk}\widehat{b})(IB{D}_{ijk}-0.5),$
where $\widehat{b}$ and ${\widehat{\mu}}_{k}$ are the estimates of b and μ_{ k }, respectively. In the following presentation, we denote the QLS obtained with and without modeling an association marker ${U}_{ijk}^{(a)}$ and ${U}_{ijk}^{(b)}$, respectively. Given these two sets of QLSs, ${U}_{ijk}^{(a)}$ and ${U}_{ijk}^{(b)}$, we expect the mean score ${\overline{U}}^{(b)}$ to be larger than ${\overline{U}}^{(a)}$ when the associated marker is the QTL, or is linked in disequilibrium with it. To compare the two means, we may apply a one-sided paired t-test. Let ${\widehat{U}}_{ijk}^{(a)}={U}_{ijk}^{(a)}-{\overline{U}}^{(a)},{\widehat{U}}_{ijk}^{(b)}={U}_{ijk}^{(b)}-{\overline{U}}^{(b)}$ and let n be the total number of sibpairs. The statistic is then defined by
$T=({\overline{U}}^{(b)}-{\overline{U}}^{(a)})\sqrt{n(n-1)/{\displaystyle \sum ({\widehat{U}}_{ijk}^{(b)}}-{\widehat{U}}_{ijk}^{(a)}{)}^{2}}\phantom{\rule{0.1em}{0ex}}\left(5\right)$
and under the null hypothesis follows a t distribution with degrees of freedom n - 1. The one sided p-value is given by P(t_{n - 1}> T).
It is useful to examine this statistic under various situations. When the marker modeled is not associated with the phenotype, the allelic effect b is expected to be small and therefore the statistic is likely to be close to zero. However, when there is an association between the marker and the quantitative trait in a statistical sense, but it is not related to the detected linkage (for example it is due to the well-known bias from population stratification), we may not expect the allelic effect b to be small. In this scenario, we may look upon the marker as a covariate representing to some extent population stratification, and therefore modeling this marker would reduce the residual variance of the trait similarity measure coming from population stratification, and hence strengthen the linkage signal. So we can expect the statistic T to be more likely to be negative, and our test statistic would maintain the type I error rate in a conservative fashion in the case of population stratification. Our simulation results agree with this line of reasoning (see results). In this sense, a small lower sided p-value, i.e. P(t_{n - 1}<T), indicates a spurious association, which is also seen in the simulations.
For simplicity, assume the allelic effect b and the sibship mean are μ_{ k } known and so can be specified correctly; it can then be easily shown that for sibpair (i,j) in family k, $E({U}_{ijk}^{(b)}-{U}_{ijk}^{(a)})=({x}_{ik}{x}_{jk}{b}^{2})IB{D}_{ijk}$ (see Appendix 2). This equation indicates that the proposed statistic essentially tests the correlation (or interaction) between the similarity of an associated marker effect, which is measured by a cross-product, and the IBD sharing between two sibs in a pair. Compared to a usual quantitative linkage analysis that detects linkage by testing the correlation between the IBD sharing and trait similarity, which may also be described as a cross-product (e.g. as in HE regressions and the variance component model), we can expect the proposed statistic to be much more powerful for detecting linkage because the noise (residual variances) from polygenic and common environmental effects is eliminated as well as the individual random effects. So, even if a usual linkage analysis fails to show signals in a region, the proposed statistic can still be useful to detect linkage when we have a candidate locus in a region.
Results
Sample selection
Because in practice the number of alleles shared IBD is generally not known with certainty, owing to partially informative markers and missing parental genotypes, we also performed computer simulations to examine the usefulness of the QLS in sample selection for an association study by comparing, in various situations, the statistics from random samples of unrelated individuals and from samples based on the rank order of the QLS. The statistic used to make the comparison is the score statistic proposed by Schaid et al. [19], which follows a χ^{2} distribution with one degree of freedom for an additive model.
In our simulations, we generate 1000 sibships of size 2 from different subpopulations. A total of 6 markers, evenly space at a 2 cM density in a 10 cM range and each with 4 equally frequent alleles, are used for the linkage analysis. A QTL with 2 equally frequent alleles is located midway between marker 3 and marker 4. We assume Hardy-Weinberg equilibrium at each marker, linkage equilibrium among the markers and a Haldane no-interference map function. Trait values are constructed as the sum of a major-gene effect generated by the QTL, normal random individual effects, polygenic effects and common environmental effects. We calculate the probabilities of the number of alleles shared IBD using the program GENIBD in the S.A.G.E. package [20], removing the QTL genotype for this calculation.
To examine different ways of summarizing the several QLSs for a sibship, we also simulated sibships of different sizes, ranging from 2 to 4. The traits for the population with two subpopulations were simulated as before. We sampled 100 unrelated individuals from the 1000 sibships at random, or according to the rank order of the mean QLS, the minimum QLS and the maximum QLS of each sibship, respectively. Our results showed that the average χ^{2} values obtained based on any of the QLSs are greater than those from random selection and that they have small differences between them (${\chi}_{mean}^{2}>{\chi}_{max}^{2}>{\chi}_{min}^{2}$) (data not shown).
Testing the correlation between association and a previous linkage
Empirical type I error rate of the proposed test at the nominal 5% level. A diallelic marker is completely linked to the QTL under HW equilibrium.
Effects ^{1} | 500 sibpairs | 500 sibships ^{2} | ||
---|---|---|---|---|
Linkage | Association-Linkage | Linkage | Association-Linkage | |
10%/80% | 0.195 | 0.056 | 0.391 | 0.059 |
20%/70% | 0.211 | 0.059 | 0.310 | 0.057 |
30%/60% | 0.236 | 0.055 | 0.328 | 0.055 |
40%/50% | 0.248 | 0.054 | 0.346 | 0.058 |
50%/40% | 0.270 | 0.056 | 0.368 | 0.056 |
Empirical type I error rate of the proposed test at the nominal 5% level when population stratification exists. A diallelic marker is completely linked to the QTL with HW equilibrium in an admixed population. Total 500 sibpairs (250/250) are selected from two subpopulations. p_{1} and p_{1} are frequencies of the rarer marker allele in the two subpopulations. d is the difference in trait means between two subpopulations.
p_{1} - p_{2} | d = 10 | d = 20 | ||||
---|---|---|---|---|---|---|
Linkage | Association-Linkage | Population stratification | Linkage | Association-Linkage | Population stratification | |
0.4 | 0.200 | 0.00055 | 0.403 | 0.165 | 0 | 0.758 |
0.3 | 0.191 | 0.0016 | 0.311 | 0.166 | 0 | 0.536 |
0.2 | 0.200 | 0.0054 | 0.177 | 0.159 | 0.001 | 0.283 |
0.1 | 0.192 | 0.025 | 0.084 | 0.154 | 0.012 | 0.106 |
0 | 0.194 | 0.050 | 0.019 | 0.150 | 0.047 | 0.013 |
The power of the proposed test for 500 sibpairs when linkage signal is weak. A diallelic marker is completely linked to the QTL in perfect disequilibrium. The trait value is generated by a QTL with variance of varying size, together with polygenic and common environmental effects (with variance 0.3) and a random individual effect (with variance 0.5).
QTL variance | Fully informative marker | Partially informative marker | ||
---|---|---|---|---|
Linkage | Association-Linkage | Linkage | Association-Linkage | |
0.03 | 0.102 | 0.307 | 0.083 | 0.285 |
0.05 | 0.148 | 0.402 | 0.145 | 0.475 |
0.10 | 0.284 | 0.666 | 0.260 | 0.575 |
0.20 | 0.563 | 0.874 | 0.520 | 0.838 |
Discussion
There is great interest in QTL mapping because many important diseases themselves, or intermediate phenotypes, are measured on a continuous scale. Although trait-marker association studies are expected to be soon conducted genome-wide, because of cost considerations currently an association study often focuses on candidate regions determined by a previous linkage study. For such an association study, we should utilize the information available in the previous linkage study to optimize its design and to facilitate its interpretation. We have proposed a quantitative linkage score, based on the widely used HE regression, to provide quantitative linkage information useful for a follow-up association study. This score is not limited to continuous traits, but can also be used for binary (affected/unaffected) traits. We illustrated the usefulness of this score to answer two different questions posed by an association study: (1) how to select samples at the design stage when heterogeneity exists; and (2) how to test at the inference stage whether an observed association can explain in part a previous linkage signal. In this paper, we are not necessarily advocating a two-stage approach to analyze family data on which we have information on both linkage markers and association markers. For such data a joint linkage and association framework could be of more interest than a two-stage analysis approach. Recent work on this kind of joint analysis has included work on both regression-based methods [22] and variance-component methods [23, 24]. However, in the presence of heterogeneity any advantage such a joint analysis may have when performed using all the data available may be lost, because those families that are not affected because of segregation at a linked locus will "dilute" the effect and result in loss of power. Therefore, even for analyzing data with information from both linkage markers and association markers, we may consider first selecting families based on the QLS to exclude such "dilution" as much as possible.
The idea of selecting families with linkage evidence for further genotyping in a follow-up association study is not new and has been successfully implemented in practice. In the context of quantitative traits, the proposed score can conveniently be used to summarize quantitative linkage information from a sibpair (or sibship). We have shown that in a heterogeneous population, which is expected to commonly occur for a complex trait, selecting a sample of unrelated persons based on the order of the QLS magnitude results in a more homogeneous sample for an association study than does a random sample, and therefore can improve power for a given sample size. Other approaches to identifying sibpairs with linkage are available, for example using a regression diagnostic [25]. Careful comparison of these methods would merit further study.
Another use of the QLS investigated in this paper is to test whether association can account in part for a detected linkage. To address this question, we simply compare two sets of QLSs, before and after incorporating an association marker into the individual level regression model. Essentially, the proposed test evaluates the interaction of the allele effect of an associated marker and IBD sharing. In this sense it may be likened to other methods, for example the regression model proposed by Cardon [26], though our statistic emphasizes more whether an association is correlated with a previous linkage finding. This test may also be used as a substitute for the usual quantitative trait linkage analysis test when the latter fails to detect linkage. The gain in power to detect linkage by using the proposed test arises from eliminating possible environmental or other genetic noise. However, this gain is not automatic, but depends on the relationship of the associated marker to the true variant. If there is only weak linkage disequilibrium between an associated marker and the true variant, the test will be less powerful. We also showed that this statistic may be applied to detect spurious association, although that was not our primary aim. The ways commonly used in practice to detect population stratification are to use genomic control [27] or test for Hardy-Weinberg equilibrium [28]. Using IBD sharing information to test and control for population stratification provides a new approach and further study of this approach will be conducted in our future work.
Conclusion
In conclusion, as proved by our simulations, the QLS is useful for the design of, and resulting inference from, an association study following a linkage study. We suggest that careful examination of the QLS should be helpful for understanding the results of both association and linkage studies.
Appendix
Appendix 1 the derivation of q _{ qls }
For sibpair k comprising sib 1 and sib 2, Z_{ k }(z_{1k}, z_{2k}) follows the distribution f (z_{1k}, z_{2k}), which we assume to be a bivariate normal distribution. With the assumption that a random sample of full sibpairs is used for the linkage analysis, the proportions of pairs for which the number of alleles shared IBD is 0, 1 and 2 are π_{0} = 1/4, π_{1} = 1/2 and π_{2} = 1/4, respectively. Let P1 and P2 refer to subpopulation 1 and 2 and let their proportions be denoted q_{1} and q_{2}. We then have
$\begin{array}{lll}Pr(QLS>0|P1)\hfill & =\hfill & 2Pr({z}_{1k}>0,{z}_{2k}>0,IBD=1|P1)+2Pr({z}_{1k}>0,{z}_{2k}<0,IBD=0|P1)\hfill \\ =\hfill & 2{\pi}_{2}{\displaystyle {\int}_{0}^{\infty}{\displaystyle {\int}_{0}^{\infty}f({z}_{1k},{z}_{2k}|IBD=1,P1){d}_{{z}_{1k}}{d}_{{z}_{2k}}+2{\pi}_{0}{\displaystyle {\int}_{0}^{\infty}{\displaystyle {\int}_{-\infty}^{0}f({z}_{1k},{z}_{2k}|IBD=0,P1){d}_{{z}_{1k}}{d}_{{z}_{2k}}}}}}\hfill \\ =\hfill & \frac{1}{4}\left[\frac{1}{\pi}\text{arctan}\left(\frac{{\rho}_{IBD=1}^{2}}{1-{\rho}_{IBD=1}^{2}}\right)+1\right].\hfill \end{array}$
Thus,
$\begin{array}{lll}Pr(QLS>0,P1)\hfill & =\hfill & Pr(QlS>0|P1)Pr(P1)\hfill \\ =\hfill & \frac{1}{4}{q}_{1}\left[\frac{1}{\pi}\text{arctan}\left(\frac{{\rho}_{IBD=1}^{2}}{1-{\rho}_{IBD=1}^{2}}\right)+1\right],\hfill \end{array}$
which is an increasing function of q_{1} and ρ. We note that ρ depends on the size of the effect and allelic frequencies of the QTL. On the other hand,
$\begin{array}{lll}Pr(QLS>0|P2)\hfill & =\hfill & 2Pr({z}_{1k}>0,{z}_{2k}>0,IBD=1|P2)+2Pr({z}_{1k}>0,{z}_{2k}<0,IBD=0|P2)\hfill \\ =\hfill & \frac{1}{4},\hfill \end{array}$
so that Pr(QLS > 0, P 2) = Pr(QlS > 0|P 2)Pr(P 2) = $\frac{1}{4}{q}_{2}$.
Thus
${q}_{qls}=\frac{Pr(QLS>0,P1)}{Pr(QLS>0,P1)+Pr(QLS>0,P2)}$
$=\frac{1}{1+\frac{{q}_{2}}{{\scriptscriptstyle \frac{1}{\pi}}\text{arctan}\left({\scriptscriptstyle \frac{{\rho}_{IBD=1}^{2}}{1-{\rho}_{IBD=1}^{2}}}+1\right)}}.$
Appendix 2 - E(U^{(b)}- U^{(a)})
Under the trait model y_{ ik }= μ_{ k } + x_{ ik }b + e_{ ik }, we assume the e_{ ik }are identically and independently distributed with mean 0. Suppose μ_{ k } and b are known. Let the subscripts 1 and 2 indicate the two sibs of a sibpair in family k. Then
$\begin{array}{lll}E({U}_{k}^{(b)}-{U}_{k}^{(a)}\hfill & =\hfill & E[({y}_{1k}-{\mu}_{k})({y}_{2k}-{\mu}_{k})-({y}_{1k}-{\mu}_{k}-{x}_{1k}b)({y}_{2k}-{\mu}_{k}-{x}_{2k}b)]IB{D}_{12k}\hfill \\ =\hfill & E[({y}_{1k}-{\mu}_{k}){x}_{2k}b+({y}_{2k}-{\mu}_{k}){x}_{1k}b-{x}_{1k}{x}_{2k}{b}^{2}]IB{D}_{12k}\hfill \\ =\hfill & E[{x}_{1k}{x}_{2k}{b}^{2}+{e}_{1k}{x}_{2k}b+{e}_{2k}{x}_{1k}b]IB{D}_{12k}\hfill \\ =\hfill & ({x}_{1k}{x}_{2k}{b}^{2})IB{D}_{12k}\hfill \end{array}$
Declarations
Acknowledgements
This work was supported in part by a U.S. Public Health Service Resource Grant from the National Center for Research Resources (RR03655) and a Research Grant from the National Institute of General Medical Sciences (GM28356). Tao Wang is supported by a fellowship from the Merck Foundation.
Authors’ Affiliations
References
- Haseman JK, Elston RC: The investigation of linkage between a quantitative trait and a marker locus. Behavior Genet. 1972, 2: 3-19. 10.1007/BF01066731.View ArticleGoogle Scholar
- Horikawa Y, Oda N, Cox NJ, Li X, Orho-Melander M, Hara M, Hinokio Y, Lindner TH, Mashima H, Schwarz PE, del Bosque-Plata L, Horikawa Y, Oda Y, Yoshiuchi I, Colilla S, Polonsky KS, Wei S, Concannon P, Iwasaki N, Schulze J, Baier LJ, Bogardus C, Groop L, Boerwinkle E, Hanis CL, Bell GI: Genetic variation in the gene encoding calpain-10 is associated with type 2 diabetes mellitus. Nat Genet. 2000, 26: 163-175. 10.1038/79876.PubMedView ArticleGoogle Scholar
- Van Eerdewegh P, Little RD, Dupuis J, Del Mastro RG, Falls K, Simon J, Torrey D, Pandit S, McKenny J, Braunschweiger K, Walsh A, Liu Z, Hayward B, Folz C, Manning SP, Bawa A, Saracino L, Thackston M, Benchekroun Y, Capparell N, Wang M, Adair R, Feng Y, Dubois J, FitzGerald MG, Huang H, Gibson R, Allen KM, Pedan A, Danzig MR, Umland SP, Egan RW, Cuss FM, Rorke S, Clough JB, Holloway JW, Holgate ST, Keith TP: Association of the ADAM33 gene with asthma and bronchial hyperresponsiveness. Nature. 2002, 418: 426-430. 10.1038/nature00878.PubMedView ArticleGoogle Scholar
- Kim UK, Jorgenson E, Coon H, Leppert M, Risch N, Drayna D: Positional cloning of the human quantitative trait locus underlying taste sensitivity to phenylthiocarbamide. Science. 2003, 299: 1221-1225. 10.1126/science.1080190.PubMedView ArticleGoogle Scholar
- Fingerlin TE, Boehnke M, Abecasis GR: Increasing the power and efficiency of disease-marker case-control association studies through use of allele-sharing information. Am J Hum Genet. 2004, 74: 432-443. 10.1086/381652.PubMedPubMed CentralView ArticleGoogle Scholar
- Li C, Scott LJ, Boehnke M: Assessing whether an allele can account in part for a linkage signal: the genotype-IBD sharing test (GIST). Am J Hum Genet. 2004, 74: 418-431. 10.1086/381712.PubMedPubMed CentralView ArticleGoogle Scholar
- Fulker DW, Cherny SS, Sham PC, Hewitt JK: Combined linkage and association sib-pair analysis for quantitative traits. Am J Hum Genet. 1999, 64: 259-267. 10.1086/302193.PubMedPubMed CentralView ArticleGoogle Scholar
- Cardon LR, Abecasis GR: Some properties of a variance components model for fine-mapping quantitative trait loci. Behavior Genetics. 2000, 30: 235-243. 10.1023/A:1001970425822.PubMedView ArticleGoogle Scholar
- Amos CI, Dawson DV, Elston RC: The probabilistic determination of identity-by-descent sharing. Am J Hum Genet. 1990, 47: 842-853.PubMedPubMed CentralGoogle Scholar
- Lander ES, Green P: Construction of multilocus genetic linkage maps in human. Proceedings of the National Academy of Science of the Unite States of America. 1987, 84: 2363-2367.View ArticleGoogle Scholar
- Wang T, Elston RC: A Modified Revisited Haseman-Elston Method to Further Improve Power. Hum Hered. 2004, 57: 109-116. 10.1159/000077548.PubMedView ArticleGoogle Scholar
- Tritchler D, Liu Y, Fallah S: A test of linkage for complex discrete and continuous traits in nuclear families. Biometrics. 2003, 59: 382-392. 10.1111/1541-0420.00045.PubMedView ArticleGoogle Scholar
- Whittemore AS, Tu I: Simple, robust linkage tests for affected sibs. Am J Hum Gene. 1998, 62: 1228-1242. 10.1086/301820.View ArticleGoogle Scholar
- Wright FA: The phenotypic difference discards sib-pair QTL linkage information. Am J Hum Genet. 1997, 60: 740-742.PubMedPubMed CentralGoogle Scholar
- Elston RC, Buxbaum S, Jacobs KB, Olson JM: Haseman and Elston revisited. Genet Epidemiol. 2000, 19: 1-17. 10.1002/1098-2272(200007)19:1<1::AID-GEPI1>3.0.CO;2-E.PubMedView ArticleGoogle Scholar
- Shete S, Jacobs KB, Elston RC: Adding further power to the Haseman and Elston method for detecting linkage in larger sibships: Weighting sums and differences. Hum Hered. 2003, 55: 79-85. 10.1159/000072312.PubMedView ArticleGoogle Scholar
- Kruglyak L, Daly MJ, Reeve-Daly MP, Lander ES: Parametric and nonparametric linkage analysis: a unified multipoint approach. Am J Hum Genet. 1996, 58: 1347-1363.PubMedPubMed CentralGoogle Scholar
- Blackwelder WC, Elston RC: A comparison of sib-pair linkage tests for disease susceptibility loci. Genet Epidemiol. 1985, 2: 85-97. 10.1002/gepi.1370020109.PubMedView ArticleGoogle Scholar
- Schaid DJ, Rowland CM, Tines DE, Jacobson RM, Poland GA: Score tests for association between traits and haplotypes when linkage phase is ambiguous. Am J Hum Genet. 2002, 70: 425-434. 10.1086/338688.PubMedPubMed CentralView ArticleGoogle Scholar
- Statistical Analysis for Genetic Epidemiology. [http://darwin.cwru.edu/sage/]
- Abecasis GR, Cardon LR, Cookson WO: A general test of association for quantitative traits in nuclear families. Am J Hum Genet. 2000, 66: 279-292. 10.1086/302698.PubMedPubMed CentralView ArticleGoogle Scholar
- Wang T, Elston RC: Two-level Haseman-Elston regression for general pedigree data analysis. Genet Epidemiol. 2005, 29: 12-22. 10.1002/gepi.20075.PubMedView ArticleGoogle Scholar
- Fan R, Spinka C, Jin L, Jung J: Pedigree linkage disequilibrium mapping of quantitative trait loci. Eur J Hum Genet. 2005, 13: 216-231. 10.1038/sj.ejhg.5201301.PubMedView ArticleGoogle Scholar
- Jung J, Fan R, Jin L: Combined linkage and association mapping of quantitative trait loci by multiple markers. Genetics. 2005, 170: 881-898. 10.1534/genetics.104.035147.PubMedPubMed CentralView ArticleGoogle Scholar
- Davis CC, Brown WM, Lange EM, Rich SS, Langefeld CD: Nonparametric linkage regression II: Identification of influential pedigrees in tests for linkage. Genet Epidemiol. 2001, 21 (Suppl 1): S123-S129.PubMedGoogle Scholar
- Cardon LR: A sib-pair regression model of linkage disequilibrium for quantitative traits. Hum Hered. 2000, 50: 350-358. 10.1159/000022940.PubMedView ArticleGoogle Scholar
- Devlin B, Roeder K: Genomic control for association studies. Biometrics. 1999, 55: 997-1004. 10.1111/j.0006-341X.1999.00997.x.PubMedView ArticleGoogle Scholar
- Tiret L, Cambien F: Letter: Departure from Hardy-Weinberg equilibrium should be systematically tested in studies of association between genetic markers and disease. Circulation. 1995, 92: 3364-3365.PubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.