- Open Access
Haplotype-sharing analysis for alcohol dependence based on quantitative traits and the Mantel statistic
BMC Genetics volume 6, Article number: S75 (2005)
Haplotype-based methods have become increasingly popular in the last decade because shared lengths in haplotypes can be used for disease localization. In this contribution, we propose a novel linkage-based haplotype-sharing approach for quantitative traits based on the class of Mantel statistics which is closely related to the weighted pair-wise correlation statistic. Because these statistics are known to be liberal, we propose a permutation test to evaluate significance. We applied the Mantel statistic to the autosomal data from the genome-wide scan of the Collaborative Study on the Genetics of Alcoholism with the Affymetrix Genotype 10 K array that was provided for the Genetic Analysis Workshop 14. Four regions on chromosome 4, 8, 16, and 20 showed p-values less than 0.005 with a minimum p-value of < 0.0001 on chromosome 16 (tsc0520638 at 72.8 cM). Three of these four regions located on chromosome 4, 16, and 20 have been reported previously in the Genetic Analysis Workshop 11.
Haplotype-based methods have become increasingly popular in the last decade because shared lengths in haplotypes can be employed to trace disease loci. Thus, they have the potential to incorporate information on chromosome structure and to handle genetic heterogeneity to an extent that exceeds the feasible limit of allelic analyses. The basic haplotype-sharing method idea has been proposed by te Meerman et al.  and further developed by te Meerman et al.  and te Meerman and van der Meulen  at a time when no dense marker maps were yet available. The method is now, however, of even greater interest due to limited informativity of single-nucleotide polymorphisms (SNPs).
The original haplotype-sharing statistic (HSS) is defined as the standard deviation of shared lengths in unrelated case haplotypes. The HSS approach provides a point-wise significance at each marker under study, and has been applied at previous Genetic Analysis Workshops (GAWs)  to investigate association.
In the present contribution, we extend this idea and propose a linkage-based haplotype-sharing Mantel (HSM) statistic for the analysis of quantitative traits on family data. Mantel statistics  in the context of haplotype sharing were first used by Beckmann . He defined spatial similarity by the shared length between haplotype pairs and temporal similarity as the phenotypic similarity between pairs. Therefore, it is very similar to the weighted pair-wise correlation (WPC) statistic [7, 8], which has previously been used for linkage analysis, allele frequency estimation, and estimation of familial correlations.
In this contribution we apply the HSM statistic to the autosomal data from the genome-wide scan of the Collaborative Study on the Genetics of Alcoholism (COGA) with the Affymetrix Genotype 10 K array that was provided for the GAW14.
The provided COGA data included 1,380 study samples within 143 families. The present analysis was based on the Affymetrix Genotype 10 K array clean dataset and contains 10,810 autosomal SNPs.
Following Zinn-Justin and Abel , we defined subjects as affected if they met the DSM-III-R criteria of alcohol dependence and the Feighner criteria for alcoholism (phenotype ALDX1). In the next step, the binary phenotype, denoted ALB1, was derived from the ALDX1 phenotypes considering only extremely affected (defined as "affected" in the data description) and unaffected ("purely unaffected" in data description). Other individuals were considered as unknown ALB1.
As phenotypes for our analysis, Pearson residuals were employed from logistic regression predicting the binary phenotypes from sex and age. In detail, sex was coded 0 for males and 1 for females, and age at examination in years was utilized. All individuals from all families were used for the logistic regression. The residuals were denoted as ALB1R. The final fitted logistic regression model was:
Haplotype-sharing method using Mantel statistics
Originally, Mantel's space-time clustering statistic had the following form
where X ij defines the spatial similarity and Y ij the temporal similarity for the pair ij. For haplotype-sharing analyses, Beckmann  replaces the spatial similarity with the shared length between haplotype pairs and the temporal similarity with the phenotypic similarity between these pairs.
This is similar to the WPC statistic in which spatial similarity is replaced by genotypic similarity of related pairs measured in terms of alleles shared identical by descent or identical by state . For the application of the HSM statistic to quantitative traits, we propose the mean corrected product of phenotypes as measure of phenotypic similarity which has been suggested previously in the context of Mantel statistics [7, 8] and also for linkage analyses:
Y ij = (Y i - μ)·(Y j - μ).
Here, Y i and Y j were given by the phenotypic values ALB1R for individuals i and j, respectively. In our analyses, Pearson residuals were employed for phenotypes, thus μ = 0. Their use has been discussed [7, 8]. Missing phenotypes were assigned a value of 0 after mean correction and therefore these subjects did not contribute to the HSM statistic but to the permutation procedure. This corresponds to a missing completely at random assumption of phenotypes.
The shared length X ab (ℓ) at marker ℓ between haplotypes a and b was measured as the number of intervals flanked by markers with the same alleles corrected by the mean shared length observed at marker ℓ in the given data :
Because each individual has 2 haplotypes, labelled 1 and 2, and each pair has 4 different haplotype pairs, the final HSM statistic is given by
where k denotes the family, and i a j b k the haplotype pair a b within person pair ij within family k. Therefore, pairs were constructed within families only, and parent-offspring pairs were discarded from computations.
In the first step, allele frequencies at single-marker loci were estimated from all individuals. In the second step, we generated a 64-bit build of GENEHUNTER v2.1\_r5 in order to be able to allocate more than 2 GB of CPU memory [details available upon request]. This allows haplotype estimation in 20-bit pedigrees for data from the Affymetrix 10 K array. The number of bits is given by 2n-f with n denoting the number of nonfounders and f the number of founders, respectively. Larger pedigrees were split into 2 or more branches and considered as independent families. In the third step, we estimated inheritance vectors autosome-wise assuming the marker order as provided for GAW14. We stored the most likely pair of haplotypes of an individual within a family estimated by maximum likelihood across the possible set of inheritance vectors. This estimate ignores linkage disequilibrium information from neighboring markers. However, this additional information may be neglected in our sample because haplotypes can be constructed in extended families from segregation patterns.
Because the hypothesis of no clustering is equivalent to the situation that the X i occurs randomly with the Y i , we decided to utilize a Monte Carlo permutation approach to estimate the empirical distribution. For this purpose, we generated replicated datasets by randomly permuting phenotypes among family members within all families keeping haplotype sharings unchanged. The empirical p-value was defined as the proportion of replicates that led to a statistic with a value greater than the one obtained given the real data. We are fully aware that our permutation approach destroys residual familiar correlation. However, it is not prone to population stratification, because we do not permute phenotypes across families.
Because of computational limitations the analyses were performed in 2 steps. First, we analyzed all SNPs and estimated empirical p-values by 1,000 permutations. Second, the number of permutations was increased to 100,000 for regions including more than 3 SNPs with empirical p-values < 0.05. Regions including SNPs with p-values < 0.005 after the second step of analysis will be reported as most interesting regions.
Results from 1,000 permutations (first step of analysis)
Our results from the first analyze yielded 13 regions including more then 3 SNPs with empirical p-values < 0.05. These regions are located on chromosomes 3, 4, 5, 8, 9, 12, 14, 16, and 20 and are shown in Table 1.
Results from 100,000 permutations (second step of analysis)
For the 13 regions obtained in the first step of our analysis we increased the number of permutations to 100,000. The results from 4 regions located on chromosome 4, 8, 16, and 20 included SNPs with p-values < 0.005. The lowest p-value was found at marker tsc0520638 located on chromosome 16 at 72.8 cM. The peak region contains 3 SNPs. All test statistics computed by the permutation approach for the marker tsc0520638 were smaller than that using the real data. If we assume that the true p-value corresponds to a LOD = 3, then with 95% confidence the upper limit of the p-value is approximately 6.2·10-5 in 100,000 simulations. The results for the 4 chromosomes are shown in Figure 1.
Our analysis uses a new Mantel based haplotype-sharing approach for quantitative traits within family data applied to the autosomal data from the genome-wide scan of the COGA with the Affymetrix Genotype 10 K array that was provided for GAW14. Our method is similar Beckmann's haplotype-sharing approach ; however, we employ family data. Furthermore, we extended the method to the analysis of quantitative traits. We permuted phenotypes within families. Our approach therefore may be considered as a linkage method and is not prone to population stratification, because we do not permute phenotypes across families. Whether the permutation procedure destroys heritability resulting in a compound hypothesis remains to be clarified in further analyses.
In the analyses, we identified 3 linkage regions that had been reported previously in GAW11. More precisely, Daw et al.  and Jacobs et al.  had detected the region on chromosome 4, Kovac et al.  and Macciardi et al.  on chromosome 16, and Palmer et al.  and Zinn-Justin et al.  on chromosome 20. We detected an additonal signal with our method on chromosome 8, which has not been reported before. The region contains approximately 35 genes with regulatory functions. This finding needs further investigation, preferably in an independent validation study.
Collaborative Study on the Genetics of Alcoholism
Genetic Analysis Workshop
- HSM statistic:
Haplotype-sharing Mantel statistic
Weighted pair-wise correlation
te Meerman GJ, van der Meulen MA, Sandkuijl LA: Perspectives of identity by descent (IBD) mapping in founder populations. Clin Exp Allergy. 1995, 25: 97-102. 10.1111/j.1365-2222.1995.tb00433.x.
te Meerman GJ, van der Meulen MA: Genomic sharing surrounding alleles identical by descent: effects of genetic drift and population growth. Genet Epidemiol. 1997, 14: 1125-1130. 10.1002/(SICI)1098-2272(1997)14:6<1125::AID-GEPI94>3.0.CO;2-I.
van der Meulen MA, te Meerman GJ: Haplotype sharing analysis in affected individuals from nuclear families with at least one affected offspring. Genet Epidemiol. 1997, 14: 915-920. 10.1002/(SICI)1098-2272(1997)14:6<915::AID-GEPI59>3.0.CO;2-P.
Beckmann L, Fischer C, Deck KG, Nolte IM, te Meerman G, Chang-Claude J: Exploring haplotype sharing methods in general and isolated populations to detect gene(s) of a complex genetic trait. Genet Epidemiol. 2001, 21 (Suppl 1): S554-S559.
Mantel N: The detection of disease clustering and a generalized regression aproach. Cancer Res. 1967, 27: 209-220.
Beckmann L: New haplotype sharing and haplotype assignment methods for mapping genes of complex diseases. Ph D thesis. 2003, Ruprecht-Karls-Universität, Medical Faculty, : -.
Commenges D, Abel L: Improving the robustness of the weighted pairwise correlation test for linkage analysis. Genet Epidemiol. 1996, 13: 559-573. 10.1002/(SICI)1098-2272(1996)13:6<559::AID-GEPI3>3.0.CO;2-W.
Ziegler A: The new Haseman-Elston method and the weighted pairwise correlation statistic are variations on the same theme. Biom J. 2001, 43: 697-702. 10.1002/1521-4036(200110)43:6<697::AID-BIMJ697>3.0.CO;2-C.
Zinn-Justin A, Abel L: Genome search for alcohol dependence using the weighted pairwise correlation linkage method: Interesting findings on chromosome 4. Genet Epidemiol. 1999, 17 (Suppl 1): S421-S426.
Nolte IM: Statistics and population genetics of haplotype sharing as a tool for fine-mapping of disease gene loci. Ph D thesis. 2003, University of Groningen, Department of Medical Genetics, : -.
Daw EW, Kumm J, Snow GL, Thompson EA, Wijsman EM: Monte Carlo Markov chain methods for genome screening. Genet Epidemiol. 1999, 17 (Suppl 1): S133-S138.
Jacobs KB, Wedig GC, Schnell AH, Witte JS, Elston RC: Model-based and model-free multipoint genome-wide linkage analysis of alcoholism. Genet Epidemiol. 1999, 17 (Suppl 1): S175-S180.
Kovac I, Rouillard E, Merette C, Palmour R: Exploring the impact of extended phenotype in stratified samples. Genet Epidemiol. 1999, 17 (Suppl 1): S211-S216.
Macciardi F, Morenghi E, Morabito A: Alcoholism as a complex trait: comparison of genetic models and role of epidemiological risk factors. Genet Epidemiol. 1999, 17 (Suppl 1): S247-S252.
Palmer LJ, Tiller KJ, Burton PR: Genome-wide linkage analysis using genetic variance components of alcohol dependency-associated censored and continuous traits. Genet Epidemiol. 1999, 17 (Suppl 1): S283-S288.
AZ had the original idea for the study and provided intellectual input. DF did the programming. AK and IRK performed the analyses and wrote the first draft of the manuscript. All authors read and approved the final manuscript.
About this article
Cite this article
Kleensang, A., Franke, D., König, I.R. et al. Haplotype-sharing analysis for alcohol dependence based on quantitative traits and the Mantel statistic. BMC Genet 6, S75 (2005) doi:10.1186/1471-2156-6-S1-S75
- Genetic Analysis Workshop
- Spatial Similarity
- Haplotype Pair
- Permutation Approach
- Haplotype Estimation