An experimental assessment of in silico haplotype association mapping in laboratory mice

Background To assess the utility of haplotype association mapping (HAM) as a quantitative trait locus (QTL) discovery tool, we conducted HAM analyses for red blood cell count (RBC) and high density lipoprotein cholesterol (HDL) in mice. We then experimentally tested each HAM QTL using published crosses or new F2 intercrosses guided by the haplotype at the HAM peaks. Results The HAM for RBC, using 33 classic inbred lines, revealed 8 QTLs; 2 of these were true positives as shown by published crosses. A HAM-guided (C57BL/6J × CBA/J)F2 intercross we carried out verified 2 more as true positives and 4 as false positives. The HAM for HDL, using 81 strains including recombinant inbred lines and chromosome substitution strains, detected 46 QTLs. Of these, 36 were true positives as shown by published crosses. A HAM-guided (C57BL/6J × A/J)F2 intercross that we carried out verified 2 more as true positives and 8 as false positives. By testing each HAM QTL for RBC and HDL, we demonstrated that 78% of the 54 HAM peaks were true positives and 22% were false positives. Interestingly, all false positives were in significant allelic association with one or more real QTL. Conclusion Because type I errors (false positives) can be detected experimentally, we conclude that HAM is useful for QTL detection and narrowing. We advocate the powerful and economical combined approach demonstrated here: the use of HAM for QTL discovery, followed by mitigation of the false positive problem by testing the HAM-predicted QTLs with small HAM-guided experimental crosses.


Background
Model organisms facilitate the discovery of genes through classical experimental methodologies and, more recently, through the application of bioinformatics resources and tools. Inbred line crosses have been successfully uncover-ing quantitative trait loci (QTLs) of complex traits since the early 1990's [see review in [1]]. Meanwhile, vast improvements in high-throughput methodologies and the resulting availability of genomic data such as genomewide sequence coverage, single nucleotide polymorphism (SNP) information, and tissue-specific expression data have enhanced the capabilities of classical model organism research. The molecular dissection of complex traits is aided by harnessing these increasingly accessible data through the integration of bioinformatics into traditional methods of gene discovery [2][3][4].
In silico QTL mapping, or as we prefer to call it, haplotype association mapping (HAM), has been proposed as a means of utilizing phenotypic data from inbred strains together with dense marker maps to determine associations of haplotypes with phenotypes [5,6]. Initially this approach was highly criticized on the one hand as having too many statistical issues to be useful, such as a high rate of false positives that could lead to much dead end work [7], while on the other hand it was welcomed with excessive zeal as the end to tedious and expensive experimental crosses [5].
Subsequent reports have tempered these two 2 perspectives somewhat with investigations of the power of HAM when variables and algorithms are altered, such as the number of strains, the "population structure" of the strain panel, the density of markers, the size of the haplotypes, the haplotype inference method, and efforts to control family-wise error [6,[8][9][10][11]. In addition, some researchers have found HAM useful as part of an integrative fine-mapping method for narrowing known QTLs [12,13] or as a method for QTL discovery [14,15], while others have found that the method lacks power for their phenotype of interest [16].
Understanding the conditions under which HAM succeeds or fails to identify QTLs and the mechanisms by which type I errors (false positives) and type II errors (false negatives) are generated will help researchers wisely use this powerful tool. The number and the genetic relatedness of the inbred strains used in HAM analyses are known to affect its statistical power [8,9,17,18]. In addition, it has been noted that spurious associations among both linked (cis) and unlinked markers (trans) are likely to be a problem in association studies such as HAM [7,10,12].
In this study, we evaluated the performance of HAM as a tool for discovering and mapping QTLs. We used it to predict QTLs for red blood cell count (RBC) and for plasma concentration of high density lipoprotein cholesterol (HDL) in mice. We then tested whether each HAM-predicted QTL was a real QTL, as determined by a previously reported QTL cross that fit the haplotype at the HAM-predicted QTL peak or by a new cross carried out specifically to test these HAM-predicted QTLs. Our results, which allowed us to calculate the rate of false positives, show that HAM can be a powerful, although not comprehensive, predictor of QTLs even for a highly polygenic trait such as HDL cholesterol. We also show that extensive allelic association, the matching of SNPs at true HAM peaks with SNPs elsewhere in the genome, is responsible for type I error control problems; and we demonstrate a method for mitigating this issue that involves conducting relatively small-scale experimental crosses to test HAM QTL peaks.

RBC -red blood cell count HAM -Haplotype association mapping
With the trait of red blood cell count, our goal was to assess the usefulness of HAM in discovering new QTLs when working on a quantitative trait for which little was known about its genetic architecture. Haplotype association mapping of 33 classic inbred strains for RBC in females ( Figure 1) yielded one significant peak on chromosome (Chr) 2 and 7 suggestive peaks on Chrs 1, 4, 7, 11, 12, and 15 (see Table 1). Here, we defined a HAM "peak" as the individual SNP plus the consecutive 2 SNPs that make up its 3-SNP window (see Methods). Because each unique QTL may be surrounded by SNPs that also were above the level of significance, we have included a column in Table 1 that shows the range of SNPs surrounding the HAM peak that are below the suggestive significance of 0.63. For some QTL, such on the one on Chr 1, only the single SNP window reached suggestive significance, but for the QTL on Chr 2, this range is so large (116-131 Mb) that we suspect a second QTL gene may exist in the region.

(B6 × CBA)F2 intercross
The next question is which of these 8 HAM-detected peaks detected at a significance of 0.63 represents a true peak and which are false positives. To answer that question, we first examined the literature for red blood cell QTLs and found that 2 of these QTLs on Chrs 4 and 11 have been detected previously with classic inbred line crosses and the haplotype at the HAM peaks differed for the parental strains that detected the QTL [19]; thus these 2 are true positives. To determine whether the remaining 6 peaks are true or false positives, we tested them with a small haplotype-based intercross. At each of the 6 peaks, we determined the haplotypes, examined which pair of strains would be expected to have a QTL peak at that location, and then carried out a cross between CBA and C57BL/6J (B6) because those 2 strains would be able to detect all 6 QTLs if they were true positives. Because we were only testing these 6 locations, we used a small cross of 108 mice and only tested polymorphic markers at or flanking each of the 6 peaks to determine whether the polymorphisms were associated with a difference in phenotype.
One-way ANOVAs conducted at markers near, including, or flanking the 6 HAM peaks revealed that the significant peak on Chr 2 (124.8 Mb) and the suggestive peak on Chr 12 (73.8 Mb) were both significantly correlated with RBC ( Figure 2). The remaining 4 peaks tested (Chr 1: 32.9 Mb, Chr 7: 24.4 Mb, Chr 11: 116.5 Mb, and Chr 15: 100.6 Mb) were not significantly correlated with RBC (data not shown), so we classified these as "false positives" ( Table  2). If one uses a suggestive level of 0.63, this gives a false positive rate of 50% (4 of 8); if we had used a more stringent suggestive level of 0.20, there would have been no false positives for any of the 3 loci detected at this level. However, one true positive on Chr 11 would have been lost.

Allelic association analysis
We then tested each false positive locus to determine if it was in allelic association with the experimentally verified HAM QTLs. We used the common statistical test for linkage disequilbrium, which is the same as allelic association except that linkage disequilbrium is restricted to loci on the same chromosome. We tested the non-independence of the false positive HAM peaks by examining the pairwise allelic association among all of the peaks for RBC ( Figure  3). All "clusters" of SNPs within a chromosomal peak were found to be in significant linkage disequilbrium with each other, including the cluster of spread out peaks from 116.3 Mb to 131.3 Mb on Chr 2. In addition, the false positives on Chrs 7, 11, and 15 were in significant allelic association with at least one real QTL ( Figure 3). The Chr 7 and Chr 11 false positive peaks were in allelic association with both the novel HAM QTL on Chr 2 and with the HAM QTL within the known QTL on Chr 11 [19] detected by a (NZW × SM)F2 intercross. The Chr 15 false positive was in allelic association with the HAM QTL on Chr 12. Only the Chr 1 false positive was not in significant allelic association with any other HAM peaks after a conservative Bonferroni correction for multiple tests.

HDL -high density lipoprotein cholesterol HAM -haplotype association mapping
Using the trait of HDL cholesterol, our goal was to assess the usefulness of HAM for a quantitative trait with highly HAM genome scan for RBC in 33 inbred lines of mice   [20,21]. Because no peaks reached the genome-wide significance threshold when we performed HAM for HDL with only the 33 classic inbred strains (data not shown), we added the chromosome substitution strains derived from strains B6 and A/J and 2 sets of recombinant inbred lines (AXB, BXD) for a total of 81 strains (see Methods). While this expanded strain panel increased the power of peak detection ( Figure 4), it also brought with it additional potential for false positives due to the enrichment of the B6, A/J, and DBA/2 genomes contributed by the consomic and recombinant inbred lines. This HAM yielded 46 peaks significant at the 0.05 level ( Table 3). The range in Table 3 represents any SNPs surrounding the peak that also had a significance level <0.05.

Testing for true positive peaks
As shown in Table 3, 36 of these 46 HAM peaks were true positives because they were detected previously in inbred line crosses [20,21] and because the haplotype at the HAM peaks differed for the parental strains that detected the QTL. To test whether the remaining 10 peaks were true or false positives, we tested them with a haplotype-based intercross as we did for RBC. At each of the 10 peaks, we determined the haplotypes, examined which pair of strains would be expected to have a QTL peak at that location, and then carried out a cross between B6 and A/J because those 2 strains would be able to detect all 10 QTLs if they were true positives. We tested polymorphic markers at only those 10 peaks to determine if a true QTL existed.
One-way ANOVAs revealed that 2 peaks (  To answer that question, we mapped the 95% confidence intervals of HDL QTLs for each inbred line cross onto a physical map and indicated the peak locations of the 46 HAM peaks ( Figure 6). This figure has boxes around the 2 new QTL peaks on Chr 8 detected first by HAM and then confirmed by the intercross described above. It also has ovals around the false positives on Chr 2, 3, 10, and 13. The gray bars indicate known QTLs as determined by crosses; several of these have no HAM peaks (Chrs 1, 3, 4, distal 6, 7, 10, distal 14, 17 and 19) indicating a number of false negatives or QTLs that were not detected by the HAM analysis. Calculating a false negative rate is uncertain because some of these regions may contain more than one QTL. However, there are at least 12 QTL regions that were not detected by HAM. If we add these 12 to the 38 true positives, that provides a total of 50 QTLs; since 12 were missed by HAM, the lowest estimate of a false negataive rate is 24% (12/50).

Allelic association analysis
We tested each of the 8 false positive loci for HDL to determine if each was in allelic association with experimentally verified HAM peaks. After testing for pairwise correlations, we found that all false positives were in allelic association with real HAM QTLs on multiple chromosomes (Figure 7).

Discussion
Haplotype association mapping was developed as a tool for quantitative trait analysis. Initially conceived as an inexpensive and rapid alternative to intercrosses for finding QTLs, limitations of statistical power and type I error control have restricted its widespread adoption. Whereas other studies have suggested that HAM be included in an approach to dissecting complex traits that involves haplo-type association mapping within known QTLs [12,22], we demonstrate here the additional proficiency of HAM in detecting novel QTLs. We show that using genome-wide HAM to predict QTLs and then testing the novel QTLs by small haplotype-guided intercrosses can successfully pinpoint QTLs, even for highly complex traits like HDL cholesterol. We also prove the role of allelic association in HAM's pervasive false positive problem in laboratory mice, an issue mitigated by our method of testing HAMpredicted QTLs in small crosses. Detecting QTLs by HAM is a balance between reducing missed QTLs (false negatives, or type II errors) while increasing false positives (type I errors), and increasing missed QTLs while reducing false positives. However, because one well-chosen and relatively small cross can test many HAM QTL peaks using only targeted genotyping, we feel that it is better to err on the side of reducing the missed QTLs while increasing the false positives.
Overall, combining the results of RBC and HDL, we found 12 of 54 HAM peaks were false positives. This may be an underestimate since some of the "true positives" confirmed by previous crosses may not actually be true. We think this particularly likely if 2 HAM peaks are close together such as those on distal Chr 8 or proximal Chr 14.
Only identification of the QTL genes will show whether such closely linked QTL peaks are both real. We estimate a false positive rate of 24%, a number that is almost certainly an underestimate and highly dependent on the number of strains used.
Using HAM, even quantitative traits with completely unknown or little-known genetic architecture can be economically explored. An example is the exploration of red blood cell count reported here. Using one HAM-directed small-scale experimental cross with targeted genotyping, we discovered 2 novel RBC QTLs, including a QTL on Chr 2 that includes a strong candidate gene for red blood cell count. The parental strains of the RBC test cross were based solely on their haplotype differences at the RBC HAM peaks and, in fact, are not very different in terms of mean RBC count (units per volume × 10 6 [n/uL]) with the mean RBC for B6 at 9.186 n/uL and the mean RBC for CBA at 9.232 n/uL. Upon testing the F2 progeny from this cross, we confirmed the presence of RBC QTLs on Chrs 2 and 12, which represent QTLs not yet found by traditional crosses. Furthermore, a cursory examination of the genes underlying the Chr 2 HAM QTL reveals a likely candidate gene, erythrocyte protein band 4.2 (Epb4.2). EPB42 is a major component of the erythrocyte membrane skeletal network that regulates the stability and flexibility of erythrocytes, and is implicated in the maintenance of normal surface area and cation transport in red blood cells [23][24][25]. As further evidence that it is the likely quantitative trait gene underlying this novel QTL for red blood cell Allelic association among RBC HAM peaks count, in humans and mice, a deficiency of EPB42 results in hemolytic anemias of varying severity [26].
Our results agree with the report of lung tumor susceptibility phenotypes [22], which used HAM to find QTL peaks and then used previously published crosses to determine whether the peaks were true or false. Our report goes somewhat further in using the haplotypes at the HAM-predicted peaks to carry out a small cross designed to test each of the peaks not previously found. Since we conducted the HAM analysis and then carried out the crosses to test novel peaks, a number of advances that will improve HAM and reduce the false positives have occurred. The density of the SNP maps and the number of  strains for which SNPs are available are improving rapidly. Moreover, improved algorithms that account for admixture and population structure will help minimize false positives, such as the efficient-mixed model association (EMMA) method of Kang et al. [18]. For example, we think that the high number of false positives for HDL were due to the enrichment of B6, DBA/2, and A genomes contributed by the consomic and recombinant inbred lines, and that the EMMA method of controlling for strain relatedness would have reduced these. In addition, the fact that all but one of the false positives in our experiments were in allelic association with a true QTL peak suggests that the 3-SNP window is too restrictive and that using a HAM algorithm with a haplotype-based window might be a better approach. We subsequently tried HAM using a haplotype based on a Hidden Markov Model and imputed SNPs [27] and found that the percentage of false positives was reduced. However, since the crosses were already completed, we reported the results we obtained with our initial analysis using a 3-SNP window. We also expect that the Collaborative Cross [28] will further improve the resolution of HAM in laboratory mice by disrupting some of the pervasive linkage disequilibrium found in the mouse genome. As a starting point for HAM analyses, researchers can download and upload phenotype information on inbred lines from and to online resources such as the Mouse Phenome Database http:// www.jax.org/phenome [29,30].

Conclusion
In sum, although HAM is not likely to detect all QTLs contributing to a complex trait and is insufficient as a replacement for traditional QTL mapping, we conclude that the integrated method presented here is a powerful addition to our toolbox for quantitative trait analysis. Instead of conducting large scale expensive crosses, HAM can help find new QTLs, which can then be verified using smaller crosses designed to test specific regions of the genome.
Additionally, since such crosses are chosen based on their haplotypes, this process can lead to the discovery of QTLs that may not be found by the standard method of basing parental strain choice on highly divergent phenotypes.

Animal housing & blood collection
Mice were obtained from The Jackson Laboratory and housed under conditions meeting the guidelines issued by the Association for Assessment and Accreditation of Laboratory Animal Care. All animal protocols were approved by The Jackson Laboratory's Animal Care and Use Committee. Mice were housed in 31 × 31 × 214 cm polycarbonate cages that were divided into 2 pens with no more than 5 mice/pen (Thoren Caging Systems Inc.) in rooms maintained at a 12:12 light:dark cycle and an ambient temperature of 23°C. Each cage was pressurized and individually ventilated (PIV) with HEPA filtered air supply and exhaust. Mouse colonies were regularly monitored for 15 viruses, 17 bacterial species (including Helicobacter spp., Pasteurella pneumotropica, and 2 Mycoplasma spp.), ecto-and endo-parasites, and the microsporidium Encephalitozoon cuniculi http://jaxmice.jax.org/health [31].
All mice were fed a regular chow diet (6% fat, 5K52 from LabDiets ® ) and water ad libitum. About 200 uL of blood for hematological analysis was obtained from 8-10 week old mice in both the HAM strain panel and the (B6 × CBA)F2 intercross by retro-orbital bleeding. For lipid analysis, plasma was separated from blood by centrifugation. Allelic association among HDL HAM peaks Figure 7 Allelic association among HDL HAM peaks. Allelic association among false positive HDL HAM peaks (on left) and true HDL HAM QTLs (on right).

Phenotyping
lyzer from Beckman Coulter was used to measure the lipid profiles of mouse plasma. The concentrations of high density lipoprotein cholesterol from these profiles were quantitatively determined using the SYNCHRON ® Systems Lipid Calibrator (HDLD assay).

Haplotype association mapping
The sex specific lipid and hematopoietic traits of inbred lines of mice, including the B.A chromosome substitution strains and the BxA and BxD recombinant inbred lines (RILs), were measured. Using 4-15 mice per strain, we calculated the sex specific trait means for HDL and RBC (http://www.jax.org/phenome [30]; datasets "Stylianou and Paigen" and "Peters1/MPD6202" respectively). HDL values were log 10 transformed. These trait data were input into our analysis as vectors. A SNP panel consisting of over 21,000 autosomal SNPs was compiled from SNP resources including Wellcome Trust, Broad Institute, and Perlegen, and missing SNPs were inferred from adjacent SNPs (complete SNP panels with RBC and HDL trait strain means and HAM results are available at http:// cgd.jax.org/datasets/phenotype.shtml [32]. For RBC, we included 33 classic inbred strains in our strain panel; for HDL, we included 32 classic inbred strains and added the 20 B.A chromosome substitution strains, 15 BxA RILs, and 14 RILs, for a total of 81 strains. These strain panels were input into our analyses as genotype matrices.
We computed regression-based test statistics to measure the strength of association between 3-SNP haplotypes and averaged mean phenotypes to detect haplotype groupings with significantly different mean phenotypes. Since the segregation of strains into genotypic groups varies widely over haplotype blocks, P values rather than the test statistics were reported and used for comparisons among haplotype blocks. All P values were transformed usinglog 10 (P value) to produce HAM scores for the scan plots. We controlled type I error rate for multiple testing due to genome-wide searching using family-wise error rate control [33]. Permutations were performed by shuffling the phenotype data while keeping the genotype data intact. The minimum P value was recorded for each permutation, and percentiles of their distribution were used to provide approximate multiple test-adjusted thresholds. The genome-wide type I error thresholds were estimated based on 1000 permutation tests. 3-SNP windows, or "peaks" corresponding to P value thresholds adjusted for global significance were defined as follows: P < 0.05 as significant, P < 0.2 as highly suggestive, and P < 0.63 as suggestive [34]. We estimated "peak stability" by calculating bootstrap values per peak based on 1000 bootstraps. All analyses were done in the MATLAB computing environment (The Mathworks, http://www.mathworks.com [35].

F2 intercrosses
We examined the strain haplotypes at the significant and suggestive peaks and chose strains for intercrosses based on which strains had different haplotypes. Phenotype differences between strains were not considered. For RBC, the C57BL/6J (B6) and CBA/J (CBA) haplotypes differed at all of the 6 relevant peaks in females, so we set up a small F2 intercross of B6 females crossed with CBA males. The resulting F2 females (N = 108) were phenotyped for RBC at 8 weeks old. For HDL, the B6 and A/J (A) haplotypes differed at all 10 significant peaks that were not confirmed by published HDL QTLs. Because our lab had already conducted a large (B × A)F2 intercross for kidney disease [36], we used the plasma stored at -80°C to measure HDL and used the saved DNA from 292 males for genotyping. (When these crosses were decided upon, phenotyped, and tested at the HAM peaks of interest for this study, the Chr 8 HDL QTLs were discovered here and were novel. However, while this paper was in preparation, the same (B × A)F2 cross DNA was subsequently genotyped at additional locations across the genome and was published by Stylianou et al. [37] as a traditional genomewide QTL analysis in conjunction with an evaluation of HDL QTLs in B.A chromosome substitution strains.)

Genotyping
Genomic DNA was extracted from the tail-tips of the F2 progeny using a standard phenol-choloform procedure. SNPs flanking, within, or nearby HAM peaks of interest were genotyped by the Allele Typing Service at The Jackson Laboratory in conjunction with KBiosciences Ltd. In 6 cases, microsatellite markers were used instead: D1Mit373, D2Mit1, D8Mit191, D8Mit25, D11Mit48, and D15Mit149. For these, genomic DNA was amplified in PCR reactions with 0.200 mM of each dNTP, 0.132 uM of each MIT marker primer, 1× AmpliTaq buffer, and 0.375 Units of AmpliTaq ® DNA Polymerase (Applied Biosystems) under the following amplification conditions: initial denaturation at 95°C for 5 minutes, 35 cycles of 95°C (30 seconds) + 55°C (30 seconds) + 72°C (1 minute), and a final elongation at 72°C for 7 minutes. PCR products were visualized on 2% agarose gels and scored by eye.

Statistical analyses
Intercross progeny were genotyped at markers flanking, including, or nearby the HAM peak of interest. Using JMP ® by SAS, we tested each marker for correlations between phenotype and genotype with one-way ANOVAs. We tested the non-independence of the HAM peaks using an analysis of "linkage disequilibrium" in GENEPOP, http:// genepop.curtin.edu.au [38,39]. Pairwise comparisons of the 3-SNP haplotypes at the HAM peaks for each trait were conducted by contingency table analysis using a Markov chain and G-based probability tests as implemented in GENEPOP, with 1000 permutations. A Bonferroni correction was applied to the results in each case to account for multiple testing. For RBC, all HAM peaks above the suggestive P < 0.63 threshold were tested for the non-random association of their haplotypes with each other, while for HDL, all significant HAM peaks above the significant P < 0.05 threshold were tested for the non-random association of their haplotypes with each other.