Identification of polymorphisms explaining a linkage signal : application to the GAW 14 simulated data

We applied three approaches for the identification of polymorphisms explaining the linkage evidence to the Genetic Analysis Workshop 14 simulated data: 1) the genotype-IBD sharing test (GIST); 2) an approach suggested by Horikawa and colleagues; and 3) the homozygote sharing test (HST). These tests were compared with a family-based association test. Two linked regions with highest nonparametric linkage scores were selected to apply these methods. In the first region, Horikawa's method identified the most SNPs within the region containing the disease susceptibility locus, while HST performed best in the second region. However, Horikawa's method also had the most type I errors. These methods show potential as additional tools to complement family-based association tests for the identification of disease susceptibility variants. Background Linkage analysis tends to identify broad regions of the genome that contain one or several disease susceptibility genes. However, going from a linkage peak to the actual functional polymorphisms is a daunting task. Methods that rely on linkage disequilibrium (LD), such as the transmission disequilibrium test (TDT), usually have a much better resolution for complex trait mapping. There has been recent interest in the literature for developing methods to identify polymorphisms that may be responsible for a linkage peak observed in a region. Here we apply two methods conditional on offspring genotypes [1,2] and one conditional on parental genotypes [3] to the Genetic Analysis Workshop (GAW14) simulated data for the identification of polymorphisms explaining the linkage evidence. The results are contrasted with the familybased association method implemented in TRANSMIT [4]. Methods To identify regions of the genome harboring susceptibility genes to Kofendred Personality Disorder (KPD), we performed nonparametric linkage (NPL) analysis, as implemented in GENEHUNTER [5], for a single replicate selected at random (replicate 71) for each population separately and for all 10 chromosomes provided. We selected the two regions with highest NPL scores (Karangar (KA) population on chromosome 9 and Danacaa (DA) population on chromosome 1), and requested the genotypes of additional single nucleotide polymorphisms (SNPs) located under these two linkage peaks. We then applied three methods, described briefly below, to identify polymorphisms that explain a linkage peak. The analyses were performed without knowledge of the true results. Horikawa method To assess whether a SNP is associated with the linkage evidence, Horikawa et al. [1] suggested computing the linkfrom Genetic Analysis Workshop 14: Microsatellite and single-nucleotide polymorphism Noordwijkerhout, The Netherlands, 7-10 September 2004 Published: 30 December 2005 BMC Genetics 2005, 6(Suppl 1):S88 doi:10.1186/1471-2156-6-S1-S88 <supplement> <title> <p>Genetic Analysis Workshop 14: Microsatellite and single-nucleotide polymorphism</p> </title> <editor>Joan E Bailey-Wilson, Laura Almasy, Mariza de Andrade, Julia Bailey, Heike Bickeböller, Heather J Cordell, E Warwick Daw, Lynn Goldin, Ellen L Goode, Courtney GrayMcGuire, Wayne H ning, ail Jarvik, Brion S Maher, Nancy Mendell, Andrew D Paterson, John Rice, Glen Satten, Brian Suar z, Veronica Vieland, Marsha Wilcox, Heping Zhang, Andre s Ziegler and Jean W MacCluer</editor> <note>Proceedings</note> </suppleme t> Page 1 of 6 (page number not for citation purposes) BMC Genetics 2005, 6:S88 Page 2 of 6 (page number not for citation purposes) Single SNP results for KA population on chromosome 9 (top) and DA population on chromosome 1 (bottom) Figure 1 Single SNP results for KA population on chromosome 9 (top) and DA population on chromosome 1 (bottom). The vertical dotted lines specify the haplotype region (HR). Significant SNPs are above the horizontal dotted line (-log100.05). -l o g 1 0 (p .v a lu e )


Background
Linkage analysis tends to identify broad regions of the genome that contain one or several disease susceptibility genes. However, going from a linkage peak to the actual functional polymorphisms is a daunting task. Methods that rely on linkage disequilibrium (LD), such as the transmission disequilibrium test (TDT), usually have a much better resolution for complex trait mapping. There has been recent interest in the literature for developing methods to identify polymorphisms that may be responsible for a linkage peak observed in a region. Here we apply two methods conditional on offspring genotypes [1,2] and one conditional on parental genotypes [3] to the Genetic Analysis Workshop (GAW14) simulated data for the identification of polymorphisms explaining the linkage evidence. The results are contrasted with the familybased association method implemented in TRANSMIT [4].

Methods
To identify regions of the genome harboring susceptibility genes to Kofendred Personality Disorder (KPD), we performed nonparametric linkage (NPL) analysis, as implemented in GENEHUNTER [5], for a single replicate selected at random (replicate 71) for each population separately and for all 10 chromosomes provided. We selected the two regions with highest NPL scores (Karangar (KA) population on chromosome 9 and Danacaa (DA) population on chromosome 1), and requested the genotypes of additional single nucleotide polymorphisms (SNPs) located under these two linkage peaks. We then applied three methods, described briefly below, to identify polymorphisms that explain a linkage peak. The analyses were performed without knowledge of the true results.

Horikawa method
To assess whether a SNP is associated with the linkage evidence, Horikawa et al. [1] suggested computing the link-(page number not for citation purposes) Single SNP results for KA population on chromosome 9 (top) and DA population on chromosome 1 (bottom) Figure 1 Single SNP results for KA population on chromosome 9 (top) and DA population on chromosome 1 (bottom). The vertical dotted lines specify the haplotype region (HR). Significant SNPs are above the horizontal dotted line (-log 10 0.05).  age evidence in the subset of families with probands carrying the risk genotypes. They argued that affected siblings with increased identity-by-descent (IBD) sharing over what is expected by chance would be more likely to carry the risk genotypes. Hence, by selecting pedigrees based on probands carrying risk genotypes, the probability that affected siblings share two alleles IBD should increase. The significance of the change in IBD sharing in the subset of size N S of families with the risk genotypes is assessed using a permutation approach, by randomly selecting subsets of N S families, irrespective of proband genotypes.

Genotype-IBD sharing test (GIST)
Li et al. [2] recently developed GIST to identify SNPs that can account in part for the linkage evidence in a region. They proposed a weighted analysis, in which each family is weighted according to the genotype distribution of members of the pedigree. The optimal weighting scheme depends on the model, so they suggest performing three analyses, each analysis using optimal weights for a dominant, recessive, and additive models. The maximum over all three models is used to assess whether a polymorphism partially explains the linkage evidence in a region.

Homozygote sharing test (HST)
In contrast, Dupuis and Van Eerdewegh HST method [3] conditions on parental genotypes. They argue that if a parent is homozygous at all risk SNPs in a linked region, then it should not matter which haplotype is transmitted to affected offspring because they confer the same disease susceptibility. Hence, there should be no excess IBD sharing by affected siblings inherited from parents who are homozygous at all risk variants. However, if a particular set of SNPs is in linkage equilibrium with the susceptibility SNPs, the sharing probabilities should not depend on the parental genotypes, and the probabilities of IBD sharing from homozygous and heterozygous parents should be the same. For the intermediate situation in which the tested SNPs are in LD with risk variants, some increased sharing may be observed from homozygous parents, and the degree of excess sharing will depend on the LD between the tested SNPs and the disease SNPs. Therefore, they propose to compare the observed IBD sharing from homozygous and heterozygous parents to determine if a to determine if the subset explains all of the linkage evidence. The significance of this statistic is assessed using a permutation approach.

Genome scan results and LD analysis
The maximum NPL scores were found on chromosome 9 in the KA population (NPL score = 5.35 at C0765) and on chromosome 1 in the DA population (NPL = 4.70 near C0052). We computed pair-wise LD measures (D' and r 2 ) between markers in the two regions and found that while there was some LD on chromosome 9 (maximum r 2 = 0.89), there was little LD on chromosome 1 (maximum r 2 = 0.03).   The square identified by "i" represents the i th most significant (by both HST and TRANSMIT) SNP pair ranked by the sum of the TRANSMIT and HST p-values. For example, in the top panel, "3" (in the first row) means that HST and TRANSMIT p-values for pair B8321-B8341 are both significant and the sum of their p-values is the third most significant. Hj and Tj represent the j th significant SNP pair by HST and TRANSMIT, respectively. Note, on the top panel, "2"-H2, "3"-H3 coincide; only "2" and "3" are identified. Similarly for the bottom panel, "4"-T5, "6"-H4 coincide and only "4" and "6" are identified.  Figure 1 presents the results of the single SNP analysis for the three methods (HST, GIST, Horikawa) and for TRANS-MIT for chromosomes 9 (top) and 1 (bottom). For each region, 38 SNPs were tested and plotted on the x-axis according to map distances, while the negative of the logarithm to base 10 of the p-value is plotted on the y-axis. Table 1 presents the number of significant (p < 0.05) SNPs detected in the two linked regions. After consulting the answers, we defined the haplotype region (HR) to be the set of SNPs forming the haplotypes containing the disease locus.

Single SNP analysis
In the KA population on chromosome 9, seven SNPs were associated (p < 0.05) with the linkage evidence using Horikawa et al.'s method [1], five of them within HR. In contrast, only two SNPs explained (partially) the linkage evidence using HST, both within HR, while a single SNP was identified using GIST, also within HR. TRANSMIT gave the most significant results with three SNPs (p < 0.01), all within HR.
In the DA population on chromosome 1, HST detected two SNPs at p < 0.01 (B0554, B0558) and three SNPs at 0.01 <p < 0.05 (B0561, B0564, B0566) that explain some of the linkage evidence, all within HR. In contrast, most of the statistically significant SNPs identified by the Horikawa's method lie outside of HR and represent type I error because there is no LD on chromosome 1. GIST identified one SNP close to the disease locus (B0562). Similar to chromosome 9, TRANSMIT yielded the most significant association with B0567, at the edge of the HR, and showed significant association with C0052 near the disease locus. SNPs significant by HST were tested to see if they explain all rather than some of the linkage evidence. None of the single SNPs explained all of the evidence for linkage on either chromosome (results not shown).

Two-SNP analysis
Because none of the single SNPs fully explained the linkage evidence, we looked at two-SNP combinations (SNP pairs) using HST, which generalizes easily to SNP pairs, and compared the results to TRANSMIT. Figure 2 presents the results of two-SNP analyses on chromosomes 9 (top) and 1 (bottom). The most significant single SNPs by TRANSMIT also generate significant SNP pairs with many other SNPs tested on both chromosomes.