Linkage analysis of GAW14 simulated data: comparison of multimarker, multipoint, and conditional approaches

The purposes of this study were 1) to examine the performance of a new multimarker regression approach for model-free linkage analysis in comparison to a conventional multipoint approach, and 2) to determine the whether a conditioning strategy would improve the performance of the conventional multipoint method when applied to data from two interacting loci. Linkage analysis of the Kofendrerd Personality Disorder phenotype to chromosomes 1 and 3 was performed in three populations for all 100 replicates of the Genetic Analysis Workshop 14 simulated data. Three approaches were used: a conventional multipoint analysis using the Zlr statistic as calculated in the program ALLEGRO; a conditioning approach in which the per-family contribution on one chromosome was weighted according to evidence for linkage on the other chromosome; and a novel multimarker regression approach. The multipoint and multimarker approaches were generally successful in localizing known susceptibility loci on chromosomes 1 and 3, and were found to give broadly similar results. No advantage was found with the per-family conditioning approach. The effect on power and type I error of different choices of weighting scheme (to account for different numbers of affected siblings) in the multimarker approach was examined.


Methods
Linkage analysis of the Kofendrerd Personality Disorder (KPD) phenotype to chromosomes 1 and 3 was performed in the Danacaa, Karangar, and Aipoto populations, with knowledge of the "answers". An important aim of our investigation was to compare the results from an affected sib-pair (ASP) multimarker approach with those from a conventional multipoint approach, and these populations were chosen because of their ascertainment via nuclear families rather than via multi-generational pedigrees. Multipoint linkage analysis was performed using the allele-sharing Zlr statistic [1] as calculated in the program ALLEGRO [2] under an exponential model. Since it was known from the "answers" that the disease loci on chromosomes 1 and 3 interact in an epistatic manner, we also performed a weighted conditional analysis in which the per-family contribution to the Zlr on one chromosome was weighted according to evidence for linkage on the other chromosome, as previously suggested [3].
The results from the multipoint approach were compared with those from a multimarker regression approach that models the observed identity-by-descent (IBD) states for ASPs at a series of genetic markers in terms of the IBD state at a presumed disease locus in the region. The expected IBD state at the disease locus, and hence the expected IBD state at the marker loci, are considered parameters to be estimated in the regression procedure. For a given marker and parent type (mother or father), the expected IBD state can be written as p M = x 1 + p D x 2 , where p M and p D correspond to the probability of sharing an allele IBD at the marker and disease locus, respectively, and the x variables correspond to conditional probabilities of marker IBD state given disease locus IBD state: x 1 = P(M|d) and x 2 = P(M|D) -P(M|d). Here M and m denote the events that the observed marker IBD state is 1 and 0, and D and d the events that the disease IBD state is 1 and 0, respectively. These may be written P(M|D) = θ 2 +(1 -θ) 2 and P(M|d) = 1 -P(m|d) = 1 -P(M|D). Thus, the expected IBD states at each of the markers are modelled in terms of p D , the expected IBD state at the disease locus (which will be estimated as a regression coefficient), and x variables that are functions of the recombination fractions θ between the markers and disease locus. The IBD states for mothers and fathers are modelled separately (assuming independence), which allows the possibility of using different values of θ for the two types of parent, i.e., incorporating sexspecific recombination fractions if desired.
The model specifying the expected IBD states is fitted to the observed marker IBD states via a generalized estimating equation (GEE) approach. Because the IBD state is considered for each parent separately, the observed IBD events are Bernoulli random variables with known functional relationship between the mean and variance, and correlation between IBD states (at different markers for a given parent type) that depends on p D and the known recombination fractions between the markers. The data may be analyzed via standard GEE software that allows specification of the correlation structure (specified under the null hypothesis that p D = 0.5). At a given putative disease locus location, this procedure provides an estimate of p D together with its estimated standard error SE ( ) that may be used to produce a z-score ( -0.5)/SE ( ) that is normally distributed under the null hypothesis that p D = 0.5. The whole procedure is repeated with the disease locus allowed to take a variety of putative positions along the marker map, and the position where the zscore is most significant is taken as the estimate of the disease locus location. An example of the fitted regression line, using the disease locus location that gives maximal evidence against the null hypothesis, is shown in Figure 1 for chromosomes 1 and 3 of the Danacaa data, replicate 100.
The multimarker approach is both conceptually and analytically very similar to a previously proposed GEE approach [4]. The multimarker approach differs from the previously proposed method mainly with regard to the test statistic, which is calculated at a variety of incrementŝ (putative positions of the disease locus) across the region, in an approach akin to standard multipoint analysis. The multimarker approach also differs from the previously proposed approach by considering the contribution of each parent separately, which could potentially allow the use of different marker maps in males and females (although sex-specific maps were not provided for these data). From Figure 1, it is clear that the greatest contribution to the test statistic at a given disease locus location will come from the observed IBD states at the two flanking markers. The speed of the multimarker procedure can therefore be considerably increased by using data only from the two flanking markers, in an interval mapping type approach, when testing a putative disease location.
For each parent, we used data from the two flanking markers (when informative) or the closest informative flanking markers otherwise. In practice, this appeared to make very little difference to the multimarker results (data not shown) and so results presented here will all assume the flanking marker approximation.
An issue not investigated in the previously proposed approach [4] was the choice of different possible weighting schemes for ASPs derived from sibships with more than two affected individuals. Several different weighting schemes have been proposed to adjust for the non-independence of such affected pairs, but the optimal scheme will depend both on the analysis method used and on whether the goal is merely to maintain type I error or also increase power [5]. With regard to power, the optimal weighting scheme may depend on the unknown underlying genetic model [5]. We investigated the performance of four different weighting schemes for the multimarker approach. The schemes investigated were 1) the Hodge scheme [6], in which the contribution of each ASP from a sibship with a affected individuals is scaled by a factor of    Figure 2 shows the results from the multipoint and multimarker (with Hodge weights) approaches applied to a single replicate, replicate 100. Results are very similar for both methods. The Danacaa study appears to provide good evidence for the disease locus on chromosome 1, but the results on chromosome 3 are less convincing. The Karangar and Aipotu studies show little evidence of linkage on chromosome 1 but provide good evidence of linkage for the disease locus on chromosome 3. The results for the multipoint analysis of all 100 replicates are shown in Table 1. The average maximum Zlr on each chromosome is slightly higher than the average Zlr at the true disease locus location, as expected, owing to the upward bias incurred by choosing the maximum on a chromosome. The Danacaa study generally provides good evidence for the disease loci on chromosomes 1 (mean Zlr = 4.52, p = 3 ×10 -6 ) and 3 (mean Zlr = 3.92, p = 4 ×10 -5 ). The Karangar study provides reasonable evidence for the disease locus on chromosome 3 (mean Zlr = 2.80, p = 0.002) but little evidence on chromosome 1 (mean Zlr = 1.32, p = 0.09), while the Aipotu study provides good evidence for the disease locus on chromosome 3 (mean Zlr = 3.20, p = 0.0007) and some evidence for the disease locus on chro-mosome 1 (mean Zlr = 2.08, p = 0.02). The Zlr scores from the conditional weighted analyses are lower than those from the unweighted analysis, indicating no advantage from using conditioning weights.

Results
The z-score results from the multimarker approach are given in Table 2, and are found to be broadly comparable with the multipoint results, particularly when using the Hodge or Suarez and Hodge weighting schemes. Type I error is acceptable for all four weighting schemes, as shown in Table 2 by the analysis of chromosome 4, on which no disease locus exists. The mean z-score on chromosome 4 is close to 0 with variance close to 1 and approximate normality (and therefore correct type I error, data not shown) for all four weighting schemes. The positions of the maximum Zlr from the multipoint approach and the maximum z-score from the multimarker approach are shown in Figure 3. Localization of the disease loci (at true positions approximately 173 cM on chromosome 1 and 314 cM on chromosome 3) is generally good for both methods, although there is some suggestion that the localization on chromosome 1 in the Danacaa population is slightly more precise under the multipoint approach.

Discussion
Overall, the multimarker and multipoint approaches appear to provide quite similar results, particularly when using the Hodge or Suarez and Hodge weighting schemes. Slightly greater power for the multimarker approach is obtained using the 'Equal pairs' weighting scheme, which is consistent with the results of Sham et al. [5]. The generally stronger results from the Danacaa study in comparison to the Karangar and Aipotu studies are perhaps not surprising, given that the ascertainment of the Danacaa families is via phenotype 1, which is influenced solely by the disease loci on chromosomes 1 and 3.
The Zlr scores from the conditional weighted analyses are lower than those from the unweighted analysis, indicating no improvement in power from using conditioning weights, and no power to detect an interaction. The exact form of the proposed interaction is not specified in the "answers" and could potentially correspond to a number of different underlying scenarios [8]. Only those scenarios that result in departure from a multiplicative penetrance model might in fact be expected to be detectable using the approach described here.

Conclusion
The multipoint and multimarker approaches were generally successful in localizing known susceptibility loci on chromosomes 1 and 3, and were found to give broadly similar results. No advantage was found with a per-family conditioning approach. For the multimarker approach, greatest power and acceptable type I error was seen with the 'Equal pairs' weighting scheme.  Histograms showing location of maximum over 100 replicates for multipoint and multimarker methods Figure 3 Histograms showing location of maximum over 100 replicates for multipoint and multimarker methods.