The first 3 methods we investigated are what we term 'locus-scoring' methods and do not utilize haplotype information. The last three are 'haplotype-sharing' methods and analyze the data by clustering similar haplotypes to reduce the dimensionality of the data. The 6 methods which we have implemented are denoted (i)–(vi) and are described below.

(i), (ii), (iii) Logistic regression is widely employed for the modelling of association between genes and binary traits [1, 6]. Here the continuous trait *Y* is regressed on the genotypes at the diallelic loci *M* under consideration, some of which may be etiological. The *m*
^{th} genotype is coded *X*
_{
m
}according to the number of rare alleles at the *m*
^{th} locus. The model is

We implicitly assume multiplicative penetrance. We consider models with *M* = 1 (i), *M* = 6 (ii) and *M* = 6 with 5 pair-wise (adjacent loci) interactions (iii).

(iv) For sliding windows of *M* single-nucleotide polymorphisms (SNPs) Durrant et al. [3] suggest grouping similar haplotypes using hierarchical clustering. This requires calculation of a distance measure. For the case of no missing data and denoting alleles as 0 or 1, the distance between haplotypes *i* and *j* is measured as

where *p*
_{
m
}is the (observed) frequency of allele 1 at locus *m* and
, where *I*(.) represents an indicator variable and
denotes the allele at locus *m* of the haplotype. Durrant et al. [3] recommend performing the hierarchical clustering then fitting logistic regression models using haplotype cluster membership as covariates. They search for the optimal association across different numbers of clusters and SNP window sizes and apply a Bonferroni correction.

(v), (vi) We have modified approach (iv) by considering 2 alternative measures of similarity. The similarity between a pair of haplotypes is now measured without restriction to a window of markers. The distance measure used in (v) is based upon the length of the segment shared identically by state (IBS) around a putative locus in the studied region. Distance is measured simply as 1-L_{1}/L_{2}, where L_{1} is the number of consecutive alleles shared either side of the putative locus, and L_{2} is the total number of markers in the region being studied. The putative locus is assumed to be located between a pair of adjacent markers. Each marker interval in the region is tested in turn as the putative locus. This is approach (v). Method (vi) modifies the distance measure used in (v) by incorporating allele frequency weights in a similar manner to (2). If *k* markers *a*, .., *a+k-1* are shared IBS then the distance is measured as

The clustering proceeds as in method (iv). Note that we do not incorporate physical distances between markers into our measures of haplotype distances. One approach for measuring haplotype distances, incorporating marker distance information is described by Molitor et al. [7].

Other, more flexible, approaches to haplotype clustering have been proposed but are computationally demanding and have not been included here for that reason. Thomas et al. [2], for example, propose assigning haplotypes to clusters probabilistically, using the Potts model and using reversible jump Markov chain Monte Carlo (MCMC) methods to update the number of clusters and the location of the variant. This is more flexible because it allows partitions other than those formed by cutting at various points on the dendogram/genealogical tree; it instead attaches higher prior weight to more likely partitions of haplotypes.