- Open Access
Covariate linkage analysis of GAW14 simulated data incorporating subclinical phenotype, sex, population, parent-of-origin, and interaction
BMC Geneticsvolume 6, Article number: S45 (2005)
We evaluate a method for the incorporation of covariates into linkage analysis using the Genetic Analysis Workshop 14 simulated data. Focusing on a randomly chosen replicate (42) we investigated the effect of the 12 subclinical phenotypes, sex, population, and parent-of-origin on the linkage signal from a model-free linkage analysis of Kofendrerd Personality Disorder.
We detected a linkage peak on chromosome 1, at about 175 cM, which varied depending upon individuals' status for subclinical phenotype b. A linkage peak on chromosome 3 (310 cM) was found not to depend upon subclinical phenotype status. Further peaks were found on chromosomes 5 (12 cM), 9 (4 cM), and 10 (95 cM), depending on the status of subclinical phenotypes a, k, and c/d/g, respectively.
Retrospective comparison of our results with the simulation model showed correct identification of disease loci D1-5 on chromosomes 1, 3, 5, 9 and 10, respectively.
We chose to analyze all four populations of replicate 42 from the simulated data set. All analyses were performed without knowledge of the simulation model. The aim of the analysis was to utilize the information on the subclinical phenotypes of Kofendrerd Personality Disorder (KPD), sex, population, and parent-of-origin in a linkage analysis. Including covariates in the analysis allowed us to investigate models, such as locus heterogeneity, that give rise to different subclinical phenotypes within KPD. We present the results of our analyses and a retrospective comparison with the simulation model.
We began by screening the genome for linkage to KPD. We performed separate scans of the microsatellites and the single-nucleotide polymorphism (SNP) data using the Zlr test statistic from ALLEGRO , with the "pairs" option and exponential model. Pedigrees with more than 17 individuals were trimmed to permit analysis with the software. We then examined the effect of the covariates on the linkage peaks. To do this we fitted subclinical phenotype, sex, population, and parent-of-origin status as covariates in a model-free linkage analysis of the microsatellite marker data. We also looked for interactions between linkage peaks using this approach.
Linkage analysis using covariates
The multipoint likelihood of the marker data of an affected relative pair at any point in the genome is given by
where z j is the (unknown) probability that an affected relative pair share j alleles identically by descent (IBD), and f ij , are the prior and posterior (conditional on the observed marker data) probabilities that pair i shares j alleles IBD [2, 3]. These were obtained for each pair at 1-cM intervals with and without parental specific allele sharing estimates using MERLIN  and ALLEGRO , respectively. Let p FS be the probability that a pair of affected full siblings share a given parental allele IBD. Following the suggestion of Rice [5, 6], in the absence of a parent-of-origin effect the probabilities of sharing paternal and maternal alleles IBD were assumed to be equal and independent. Then z0 = (1 - P FS )2, z1 = 2 p FS (1 - P FS ), and . Similar formulae apply for double-first-cousin pairs.
Other types of relative pair, R, can only share 0 or 1 allele IBD. For these, z0 = 1 - P R , z1 = P R , and z2 = 0 (where P R is the IBD probability for affected relative pairs of type R).
Inclusion of categorical covariates
The effect of a binary covariate on the IBD sharing probabilities may be investigated by modelling P R in a logistic regression framework including a 3-level factor β with levels corresponding to the status of the pair with respect to the covariate (-/-, -/+ or +/+, where - denotes absence and + presence of the covariate in an individual). That is,
where O R is a fixed offset, ensuring that P R takes the correct value for a relative pair of type R in the absence of linkage (i.e., all coefficients in the regression = 0). Under the null hypothesis of no covariate effect, α is a measure of the divergence of IBD from the null in the sample as a whole. The subscript k indexes the status of the particular relative pair with respect to the covariate. Multiple pairs from the same pedigree were analysed as if they were independent, with parameters α and β in common. To ensure identifiability of the parameters, β-/- was set to zero (making α a measure of IBD divergence from the null in -/- pairs). The degree of IBD sharing for the discordant (-/+) pairs was constrained to be less than or equal to the maximum IBD in the concordant pairs, to ensure that the model makes sense biologically. Each of the subclinical phenotypes (a - l) was modelled in this way, as was sex (male denoted by -, female denoted by +). Population membership was modelled as a four-level factor, with one level for each population (the first was set to zero). The total number of affected relative pairs in each category is shown in Table 1. One might expect a gene that modified the expression of a binary covariate (e.g., subclinical phenotype outcome) in individuals affected with KPD (but not KPD risk itself), to present increased sharing in -/- or +/+ pairs (or both), with -/+ pairs showing reduced sharing. A gene that acts to cause KPD with a particular set of covariate values (- or +) would cause increased sharing in either -/- or +/+ pairs, with the effects on IBD in the pairs of other types being unclear (dependent on penetrances, gene frequencies, etc.). Caution should be applied to the interpretation of the allele sharing estimates as the differences could arise from a number of reasons.
Inclusion of quantitative covariates
Locus × locus interactions between the peaks were investigated by including the estimated IBD sharing value for each pair at one location on a different chromosome (having subtracted the expected value in the absence of linkage) as a quantitative covariate in the logistic regression for IBD at the peak of interest . This is then repeated for a number of locations in the region surrounding the locus being conditioned on to allow for the fact that linkage peaks are often some distance from disease loci . The test statistic was taken to be the increase in maximum LOD score over the whole region investigated (covering both linkage peaks). For completeness, the hypothesis of an interaction between two peaks was investigated with two tests, i) peak 1 conditional on peak 2, and ii) peak 2 conditional on peak 1. In general, these give similar results.
Inclusion of parent-of-origin covariate
Finally, parent-of-origin effect was modelled in affected sibling pairs only by splitting the prior and posterior probabilities, f i1 and , of sharing 1 allele IBD into components reflecting whether the paternal or maternal allele was shared. The IBD probabilities for affected pairs were expressed in terms of IBD probabilities for the paternal (p pat ) and maternal (p mat ) alleles (e.g., z2 = p pat p mat ), with the test statistic for parent-of-origin effect given by a likelihood-ratio test of p pat = p mat .
Test statistic and significance levels
To test for effects of categorical or quantitative covariates, the likelihood was maximized with respect to α alone at each position x, to give , and to both α and β, giving . The ratio of the maximum likelihoods on the chromosome, with and without the covariate of interest, gives a LOD score, which was used as the test statistic
We allowed the location of the maximum likelihood to change when the covariate was added. This reflects the fact that linkage peaks from standard analyses are often some distance from the true disease locus . Incorporating the covariate may thus give a more accurate estimate of the disease locus location. Other test statistics are possible, for example the maximum point-wise likelihood ratio. However, the relative performance of these test statistics is unclear at present. Chromosome-wide significance levels were obtained by keeping the genotypes fixed and randomly permuting individual covariate values among the affected individuals. Pairwise covariate values were then calculated and the analysis repeated, thus significance levels reflect the dependency of pairs within a pedigree. To test for a parent-of-origin effect, the designations of paternal and maternal alleles were randomly swapped for all affected siblings in a sibship. If n replicates are generated in this manner, of which r give a test statistic greater than that in the actual data, the chromosome-wide p-value is estimated by (r + 0.5)/(n + 0.5).
For the test statistic chosen for this analysis, it was not possible to obtain a genome-wide significance level for covariate effects because this depends not only on the increase in LOD score given by the covariate, but also on the linkage evidence present without allowing for the covariate, i.e., based on
. For example, an increase in LOD score of 2 to 3 is more significant than from 0 to 1 because the former is likely to occur by chance (in the absence of covariate effects) only in a linkage peak region, whereas the latter could occur anywhere on the chromosome. An estimate of genome-wide significance for a given chromosome, allowing for multiple testing, involves a joint Bonferroni-type adjustment for the relative length of the chromosome and the number of covariate tests conducted.
The subclinical phenotypes c, d, and g were indistinguishable in the affected individuals and e, f, and h were all present in the affected individuals and hence provided no useful information for analysis. Therefore, we have carried out 10 covariate analyses on each chromosome (subclinical phenotypes a, b, c, i, j, k, and l, sex, population and parent-of-origin). The interaction analyses were carried out between identified peak regions and hence were treated separately.
We found genome-wide significant linkage peaks on chromosomes 1 (max Zlr = 4.97 at 177 cM), 3 (max Zlr = 5.58 at 310 cM), 5 (max Zlr = 5.11 at 12 cM) and 9 (max Zlr = 6.04 at 4 cM). On chromosome 1 the peak was narrower with the 3-cM SNP map than with the microsatellite map, but this effect was not seen for the other peaks.
The linkage signal on chromosome 1 was found to increase substantially when the subclinical phenotype b was fitted as a covariate in the relative pair covariate linkage analysis, a LOD of 7.07 being increased to 14.29 (chromosome-wide p < 0.0001, genome-wide p = 0.0097). The linkage evidence appeared to come entirely from the +/+ pairs (IBD = 0.66, compared to 0.49, 0.48 from the -/-, -/+ pairs). A similar effect was found on chromosome 5 with the subclinical phenotype a (LOD increased from 4.90 to 10.05, chromosome-wide p < 0.0001, genome-wide p = 0.0096), with the linkage coming from the -/- pairs (IBD-/- = 0.62, IBD-/+ = 0.44, IBD+/+ = 0.51), and chromosome 10 (at 95 cM) with the subclinical phenotype c (LOD increased from 1.04 to 5.31, IBD-/- = 0.63, IBD-/+ = 0.43, IBD+/+ = 0.53, chromosome-wide p = 0.0004, genome-wide p = 0.063). On chromosome 9, the LOD increased from 7.65 to 18.13 with subclinical phenotype k (chromosome-wide p < 0.0001, genome-wide p = 0.0096), with increased sharing in both the -/- and +/+ pairs (IBD-/- = 0.63, IBD-/+ = 0.42, IBD+/+ = 0.67). No genome-wide significant effect of subclinical phenotype was observed on chromosome 3. No significant results were obtained for the analyses considering differences in IBD owing to sex, population, parent-of-origin, or interactions between the four identified linkage peaks. For each analysis, the maximum LOD score is presented in Table 2.
Retrospective comparison of our results with the simulation model showed correct identification of disease loci D1-5 on chromosomes 1, 3, 5, 9, and 10, respectively. D1 influences phenotypes P1 and P3, which both have subclinical phenotype b, confirmed by the increased sharing we observed in the b+/+ affected pairs. D2 influences all three phenotypes, P1-3, with one or two of the subclinical phenotype b and c in a somewhat complicated manner. D2 also influences subclinical phenotype k. We observed increased IBD in the k+/+ pairs (chromosome-wide p = 0.016), but this was not significant at the genome-wide level.
D2 and D3 together help to produce P2 and P3, with D3 also influencing subclinical phenotype a. We detected the association of subclinical phenotype a with D3, finding elevated sharing in the -/- pairs and decreased sharing in the -/+ pairs. D4 is related to P2 through subclinical phenotype c and P3 through b and c. D4 also influences subclinical phenotype k, which we observed through increased IBD sharing in pairs concordant for k.
No interactions were found between loci D1-4 when examining relative pairs concordantly affected for KPD in general, or even the relevant phenotype (P1-3). This is because the penetrances of the low-risk genotype combinations were set to zero, giving a multiplicative model for interactions. Under such models, IBD sharing at one locus is independent of that at the other . The D1–D4 interaction could be detected by analyzing affected pairs to which exactly one member had P3 (a negative correlation in IBD was observed at the two loci). However, the D2–D3 interaction in P3 and the D1–D2 interaction in P1 were not detected by this method, due to the reduced penetrance of the relevant genotypes. Likewise, no linkage evidence was obtained at D6 (a modifying locus that affects the penetrance of phenotype P2), even when affected pairs discordant for P2 were analyzed. These results are consistent with the observation that affected relative-pair analysis has low power to detect locus-locus linkage interactions .
From analyzing the data blind to the simulation model, there appear to be five susceptibility genes for KPD, located on chromosomes 1, 3, 5, 9, and 10. Those on chromosomes 5 and 10 appear to influence disease only in the absence of subclinical phenotypes a and c/d/g respectively. The locus on chromosome 1 influences disease only in individuals with subclinical phenotype b, whereas that on chromosome 9 appears to have two variants, one giving rise to the presence of subclinical phenotype k in affected individuals, the other to its absence. No subclinical phenotype was found to have a significant genome-wide effect on the linkage of KPD to chromosome 3, although k reached chromosome-wide significance. Even with knowledge of the simulation model, it was difficult to detect the locus-locus interactions, suggesting that affected relative pairs give little power for such analyses.
Identical by descent
Kofendrerd Personality Disorder
Gudbjartsson DF, Jonasson K, Frigge ML, Kong A: Allegro, a new computer program for multipoint linkage analysis. Nat Genet. 2000, 25: 12-13. 10.1038/75514.
Risch N: Linkage strategies for genetically complex traits. III. The effect of marker polymorphism on analysis of affected relative pairs. Am J Hum Genet. 1990, 46: 242-253.
Olson JM: A general conditional-logistic model for affected-relative-pair linkage studies. Am J Hum Genet. 1999, 65: 1760-1769. 10.1086/302662.
Abecasis GR, Cherny SS, Cookson WO, Cardon LR: Merlin-rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet. 2002, 30: 97-101. 10.1038/ng786.
Rice JP: The role of meta-analysis in linkage studies of complex traits. Am J Med Genet. 1997, 74: 112-114. 10.1002/(SICI)1096-8628(19970221)74:1<112::AID-AJMG22>3.0.CO;2-T.
Rice JP: Diagnosis as a covariate in sib-pair linkage analysis. Am J Med Genet. 2001, 105: 55-56. 10.1002/1096-8628(20010108)105:1<55::AID-AJMG1060>3.0.CO;2-D.
Holmans P: Detecting gene-gene interactions using affected sib pair analysis with covariates. Hum Hered. 2002, 53: 92-102. 10.1159/000057987.
Cordell HJ: Sample size requirements to control for stochastic variation in magnitude and location of allele-sharing linkage statistics in affected sibling pairs. Ann Hum Genet. 2001, 65: 491-502. 10.1046/j.1469-1809.2001.6550491.x.
We gratefully acknowledge funding support from the MRC and the Higher Education Funding Council for Wales.
All authors contributed to the statistical analysis and interpretation of the data, and to the drafting of this article.