Linkage disequilibrium blocks, haplotype structure, and htSNPs of human CYP7A1 gene

Background Cholesterol 7-alpha-hydroxylase (CYP7A1) is the rate limiting enzyme for converting cholesterol into bile acids. Genetic variations in the CYP7A1 gene have been associated with metabolic disorders of cholesterol and bile acids, including hypercholesterolemia, hypertriglyceridemia, arteriosclerosis, and gallstone disease. Current genetic studies are focused mainly on analysis of a single nucleotide polymorphism (SNP) at A-278C in the promoter region of the CYP7A1 gene. Here we report a genetic approach for an extensive analysis on linkage disequilibrium (LD) blocks and haplotype structures of the entire CYP7A1 gene and its surrounding sequences in Africans, Caucasians, Asians, Mexican-Americans, and African-Americans. Result The LD patterns and haplotype blocks of CYP7A1 gene were defined in Africans, Caucasians, and Asians using genotyping data downloaded from the HapMap database to select a set of haplotype-tagging SNPs (htSNP). A low cost, microarray-based platform on thin-film biosensor chips was then developed for high-throughput genotyping to study transferability of the HapMap htSNPs to Mexican-American and African-American populations. Comparative LD patterns and haplotype block structure was defined across all test populations. Conclusion A constant genetic structure in CYP7A1 gene and its surrounding sequences was found that may lead to a better design for association studies of genetic variations in CYP7A1 gene with cholesterol and bile acid metabolism.


Background
Cholesterol 7-alpha-hydroxylase (CYP7A1) catalyzes the first reaction in the cholesterol catabolic pathway in liver. This pathway converts cholesterol to bile acids, which is the primary mechanism for the removal of cholesterol from the body. The CYP7A1 catalytic reaction is the rate-limiting step and the major site for regulating homeostasis of cholesterol and bile acids. The gene encoding CYP7A1 was cloned by using a rat homolog probe [1] and mapped to chromosome 8q11 [2]. The CYP7A1 gene spans about 10 kb and contains 6 exons, 5 introns, one 5'-UTR, and one 3'-UTR. In its 5' flanking region, consensus recogni-tion sequences for a number of transcription factors were identified [2]. A TATA box and a modified CAAT box were also identified in the promoter region of the CYP7A1 gene [3]. Numerous laboratories have illustrated a multiplex nuclear receptor mediated network that controls CYP7A1 gene expression and maintains cholesterol and bile acid balance [4]. Within this network, nuclear receptors of farnesoid X receptor (FXR), liver X receptor (LXR), retinoid X receptor (RXR), small heterodimer partner (SHP), and liver receptor homologue 1 (LRH1) are involved in a positive-versus-negative regulation. Using a FXR-deficient (-/-) mouse model, we have demonstrated feedback suppression on CYP7A1 gene transcription by FXR [5,6].
Genetic variations in the CYP7A1 gene associated to disorders of cholesterol and bile acid metabolism have been studied extensively in different laboratories. Most studies have focused on a single nucleotide polymorphism (SNP) in the promoter region of the CYP7A1 gene. This is an A/ C transversion polymorphism at -278 from the translation initiation codon, or -204 from the transcriptional start site. This polymorphism was first reported by Wang et al. [7] to link to high plasma low-density lipoprotein cholesterol concentrations. Association of this polymorphism to plasma lipid levels, hypertriglyceridemia, hypercholesterolemia, and risk to arteriosclerosis, gallstone disease, and colorectal cancer has been studied in adults and children in Caucasian and Asian populations with conflicting results [8][9][10][11][12][13][14][15][16][17][18][19]. A CYP7A1 enzyme deficiency caused by a homozygous 1302-1303 delTT deletion mutation in CYP7A1 exon 6, leading to a frameshift (L413fsX414), has been linked to a hypercholesterolaemic phenotype [20]. The information has indicated that genetic variations in the CYP7A1 gene have high impact on human cholesterol metabolic regulation and human health; however, these studies have mainly focused on a single polymorphism or a mutation. Linkage of genes for a complex disease relies on having a priori knowledge of linkage disequilibrium (LD) blocks and haplotype structure to identify polymorphisms that are associated with the disease. Therefore, it is important to determine whether there are LD blocks existing in the CYP7A1 gene in different populations. This information can be used to identify a set of haplotype-tagging SNP (htSNP) markers that can be used in an association study.
The LD blocks and haplotype structure of CYP7A1 gene can be firstly defined in three general human populations of Africans, Asians, and Caucasians using a public-available database generated by the International HapMap Project [21]. The HapMap LD patterns and haplotype structure can serve as reference to select htSNPs for an association study. LD patterns and htSNPs defined by the HapMap Project are transferable to other populations in some loci, but may vary significantly in other loci [22]. To test whether the htSNPs identified in the HapMap populations are useful for association studies in other populations, we analyzed LD patterns and haplotype structures of CYP7A1 gene in both Mexican-American and African-American populations using the selected HapMap htSNPs. Mexican-American is the fastest growing population in USA, but genetic study on this population is extremely limited. Mexican-American genetic background is a mixture of European American (50-60%) (mainly Spanish), American Indian (30-40%), and African (<5%) [23]. African-American is the major minority population in USA and has an admixture genetic background from African and European Americans [24]. Genotyping of the selected htSNPs on these two populations can provide verification of transferability of the HapMap htSNPs among populations.

Linkage disequilibrium blocks and haplotype structures of CYP7A1 gene in Caucasians, Africans, and Asians
A LD block is found in the HapMap Caucasians (CEU) spanning a 14-kb region from the proximal promoter (rs3824260) to the 3'-downstream (rs10504255) of the CYP7A1 gene (Figure 1. CEU-B1). A similar LD block from rs3824260 to the 3'-downstream was also reported in a Swedish population [18]. About 4.4 kb upstream from rs3824260, there is another LD block (CEU-B2) crossing a 3-kb region at the distal promoter region. Recombination between the two blocks is 0.84. Only five haplotypes with a frequency > 2% exist in CEU-B1 ( Figure  2. CEU-B1H1 to CEU-B1H5). CEU-B1H1 and CEU-B1H2 are two common haplotypes, together representing a total of 68% of the haplotype frequency in CEU-B1. CEU-B1H1 carries common alleles at all markers except rs8192879 in 3'-UTR, whereas CEU-B1H2 is composed of less common alleles at 5 out of 8 loci. In CEU-B2, there are only two types of haplotypes (CEU-B2H1 and CEU-B2H2). CEU-B2H1 carries common alleles at all SNP loci, whereas CEU-B2H2 has less common alleles. A similar LD pattern is found in the HapMap African YRI (Figure 1), but the larger LD block (YRI-B1) is slight shorter (9 kb from rs8192879 to rs3824260) than CEU-B1. The haplotype structure is also similar between YRI and CEU, however, the frequency of each haplotype is different. The most common haplotype (55%) in YRI-B1, YRI-B1H1, has identical haplotype structure with CEU-B1H2, whereas the second common haplotype in YRI-B1, YRI-B1H2 (22.5%), is the same as CEU-B1H1. YRI-B1H1 and YRIB1H2 together add up to 77.5% of the total haplotypes in YRI-B1. In YRI-B2, the dominant haplotype YRI-B2H1 has the same haplotype structure as CEU-B2H2, whereas less common haplotype YRI-B2H2 is identical to the common haplotype CEU-B2H1. A similar recombination (0.81) is also found between YRI-B1 and YRI-B2. In CHB and JPT, only one LD block is found from the distal promoter to a part of the CYP7A1 gene. Although the JPT-B1 (16 kb, from rs8192879 to rs1023649) is larger than CHB-B1 (10 kb, from rs1457043 to rs1023650), LD is weak between rs8192879 in intron 4 and rs1457043 in intron 2 in JPT. CHB and JPT share almost the same haplotype structure within the block. JPT-B1H1, JPT-B1H2 and JPT-B1H3 are the same as CHB-B1H1, CHB-B1H2 and CHB-B1H3, respectively. Only CHB-B1H4 (6%) is unique in CHB.
In comparison of LD and haplotype structure among the HapMap populations, strong LD is found from the distal promoter region to intron 2 of the CYP7A1 gene across the HapMap populations. Two common haplotypes with complete opposite alleles at all loci (common-versus-less common alleles) within this region count for more than 85% of total haplotype frequencies in all four HapMap populations. A diverted LD degree exists between intron 2 and the 3'-downstream region from high to low across CEU, YRI, JPT, and CHB.

Genotyping of htSNPs in Mexican-Americans and African-Americans
Because of the strong LD in the CYP7A1 genes, some markers correlate 100% with each other in a population. Only a subset of representative SNPs is necessary for defining a haplotype. These SNPs can tag either neighboring markers or a set of common haplotypes within an LD block. The htSNPs in CYP7A1 were selected using Tagger, implemented in the HaploView 3.12, which combines the simplicity of pairwise methods with the potential efficiency of multimarker approaches Nine SNP markers and one short deletion marker were selected (see detail in Table 3), in which, eight are htSNP markers defined by the HapMap populations, including rs3808607, a functional polymorphism at A-278C in the promoter region. Two functional mutations were also included. One is a two-base deletion in exon 6 (1302 delTT) causing a frame shift and CYP7A1 enzyme deficiency [20]. The other one is a C/T SNP in exon 3, causing an amino acid change at Asn233Ser. This is the only non-synonymous SNP reported in NCBI SNP database in the CYP7A1 gene.
To perform genotyping of the 10 markers in the Mexican-American and African-American populations, a highthroughput and inexpensive SNP genotyping platform was developed using thin-film biosensor chips. We have reported a microarray platform for genotyping both SNPs and microsattelite repeat on thin-film biosensor chips [27,28]. The thin-film biosensor chip has excellent sensitivity of detection and extremely low non-specific binding, making it an excellent platform for discrimination of polymorphisms [29]. A positive reaction (blue color signal) can be visualized over the unreacted background (gold color) by an unaided human eye, without any instrumentation. Once the chips are printed, they are Haplotype frequencies of the HapMap selected SNPs in the CYP7A1 gene in CEU, YRI, JPT, and CHB Figure 2 Haplotype frequencies of the HapMap selected SNPs in the CYP7A1 gene in CEU, YRI, JPT, and CHB. In each haplotype, blue bars represent allele 1, whereas red bars represent allele 2 for correlated SNPs. Black bars indicate that the SNPs are not present in this population. Numbers next to each haplotype bar are haplotype frequencies. Up-side-down red triangles indicate htSNPs in the populations. In the crossing areas, a value of multiallelic D' is shown to represent the level of recombination between the two blocks. robust. Several thousands of genotypes can be performed in a 96-well plate in a laboratory with a standard molecular genetics setting within a few hours. Cost for reagents and materials to genotype 10 CYP7A1 htSNPs, including genomic DNA isolation, PCR reaction, and SNP genotyping on the thin-film biosensor chips, is ~US$0.20 per SNP per sample. It is relative less expensive than other highthroughput genotype platforms, such as TaqMan or Realtime PCR.
To verify genotyping specificity on the thin-film biosensor chips, a pool of the synthetic targets for allele 1 or allele 2 was applied to a chip for hybridization and ligation. After signals were developed, the result images were captured by a black-white camera on a Nucleosite™ Image Analyzer (Biostar, Inc., Louisville, Colorado). High specificity was achieved on these synthetic targets with unambiguous genotypes (see images in Figure 3B and 3C). A negative control showed the signals are target dependent ( Figure  3D). As a positive control for genotyping, 12 HapMap DNA samples were purchased from Coriell Cell Repositories (Camden, NJ), which are one family trio from YRI (NA18500, NA18501 and NA18502); one family trio from CEU (NA06985, NA06991, and NA06993); three independent individuals from CHB (NA18524, NA18526, and NA18529); and three independent individuals from JPT (NA18940, NA18942, and NA18943). Genotypes of the 8 HapMap htSNPs in the 12 HapMap samples were determined on thin-film biosensor chips. A 100% concordance was obtained between the 96 genotypes generated by thin-film biosensor chips and the 96 genotypes downloaded from the HapMap database which are generated by Illumina Bead Assay.
To define the LD pattern and haplotype structures of   Figure 3E, 3F, and 3G. Genotypes of each individual on the 10 markers were saved in linkage format and uploaded to HaploView. Observed genotype frequencies, allele frequencies, expected heterozygosity, and Hardy-Weinberg p-value of the 10 markers is summarized in Table 5. No significant HW p-values (<0.0010) were found. No TT deletion mutation at 1302 and C mutation at rs8192874 were detected in these two population samples. This indicates that these mutations have very low frequencies in the general populations. The genotyping data of the 8 htSNPs were uploaded to HaploView 3.12 to define LD patterns and haplotypes structures of CYP7A1. In Mexican-Americans, three LD blocks were identified. In comparison to the CEU LD blocks, MA-B3 in the distal promoter region has the same pattern as CEU-B2, but haplotype frequencies are different. MA-B3H1 has a frequency (78%) higher than CEU-B2H1 (55%). Unlike one big block in CEU, the CYP7A1 gene is divided by two LD blocks in the Mexican-American population. MA-B2 covers from proximal promoter to intron 2, whereas MA-B1 extents from 3'-UTR to 3'-downstream. The recombination frequencies between the blocks are 80-90%. In African-Americans, two LD blocks were recognized. AA-B2 has the identical structure as YRI-B2 and frequencies of the two haplotypes (AA-B2H1 and AA-B2H1) in the African-American population are almost the same as YRI-B2H1 and YRI-B2H2. AA-B1 is shorter than YRI-B1. The HapMap htSNPs are necessary SNP markers to capture all haplotypes in the MA and AA populations.
In summary, the human CYP7A1 gene and its surrounding sequences have constant genetic structures across all populations. This genetic structure can be divided into   three components: (1) the distal promoter region, about 7-kb upstream from the transcriptional start code, there is a 3-kb LD block highly conserved across all populations. Only two haplotypes exist in this region in the most populations, except YRI. The most common haplotype in CEU and Mexican-American becomes the second most common haplotypes in YRI, African-American, CHB, and JPT populations. (2) A relative conserved LD block is present in the proximal part of CYP7A1 gene from the proximal promoter region (about 500 bp from the transcriptional start code) to intron 2 of CYP7A1. The two most conserved haplotypes count for up to 80 to 90% of the haplotype frequencies in all populations in this region. (3) A much diverted LD pattern is observed in the lower part of the CYP7A1 gene (from intron 4 to 3'-downstream). In CEU and YRI, a complete or partial LD block is merged to the block in the proximal part of the CYP7A1 gene. In Mexican-Americans, a LD block in this region is   Phosphate-CAGCTCAGGGAGAGAGAGAG-biotin separated from the block in the proximal part of the CYP7A1 gene. In JPT, a weak linkage makes the proximal part block extended into the 3'-UTR. In CHB and African-Americans, there is no LD existing in this area.

Conclusion
Here we demonstrate a genetic approach to analyze LD patterns and haplotype blocks in CYP7A1 gene. Various degree of LD is found across different regions in different populations. A set of htSNPs is identified that can be used in an association study to capture common haplotypes in different populations. An inexpensive genotyping platform on thin-film biosensor chips is established to genotype the htSNPs. This chip technology can be applied in any laboratory with basic molecular genetic setting. The defined haplotype block structure in CYP7A1 gene may lead to a better design for genetic association studies to correlate genetic variations in CYP7A1 gene to cholesterol and bile acid metabolism and human diseases, such as gallstone disease. Because of high polymorphism and strong LD in the promoter region of CYP7A1, it should be considered in future studies to evaluate which CYP7A1 promoter haplotypes are more efficient for transcriptional regulation by its regulatory factors, such as FXR, LXR, RXR, PXR, SHP, and LRH1.

Human subject
The The SNP density is about 1.8 kb per SNP. The CYP7A1 A-278C (or A-204C) promoter polymorphism is included with an ID number of rs3808607. A close promoter polymorphism C-554T, which was identified together with A-278C by Wang et al. [7], is also included as rs3824260.
Chromosomal positions and locations in the CYP7A1 gene regions of the 14 SNPs are listed in Table 1 with polymorphic allele 1 for common allele and allele 2 for less   Table  2). No marker has a HW p-value smaller than the cutoff value of 0.0010 in the four populations. The LD between any two markers was defined by HaploView 3.12. A standard color scheme is used to display LD in Figure 1 Figure 2. Alleles with blue boxes and red boxes represent common alleles and less common alleles in CEU, defined as Allele 1 and Allele 2, respectively. In the crossing areas, a value of multiallelic D' is shown to represent the level of recombination between the two blocks.

Genotyping on thin-film biosensor chip
For each selected SNP, target DNA molecules from each sample were amplified by PCR. PCR primers were designed based on the following criteria to make the PCR reaction uniform: (1) product size should be 120-200 bp with about 50-100 base flanking sequences around the SNP site in both directions, and (2) annealing temperature should be about 60°C for a standard PCR reaction condition. The best primer sets were selected by DS Gene Software version 1.5 (accelrys). The primer sequences for each SNP site are listed in Table 4. The selected primer sequences were synthesized by Invitrogen (Carlsbad, California). Multiple sets of the PCR products were amplified in a single PCR reaction.
For each SNP, three oligonucleotide probes were synthesized. A pair of allele specific P-1 oligos, differing only in their 3'-terminal nucleotide sequence, generally has 40 nucleotides complementary to the corresponding target sequences, and an additional 10-dA residue at their 5'- ends that constitutes a "spacer". Their 5'-terminal nucleotide is modified with an aldehyde group, allowing covalent attachment to the chip surface [27]. A second oligonucleotide probe (biotin-P2) with 20 nucleotides immediately adjacent to the SNP nucleotide carries a biotin at the 3' end for detection, and a phosphate at its 5' end for ligation. To test genotyping specificity, a pair of oligonucleotide targets was also synthesized. The P1, P2 and target sequences for each SNP are listed in Table 4. The synthesized P1 oligos were dissolved to 100 µM in 0.1 M phosphate buffer, pH 7.8. A P1 working solution of 1 µM in 0.1 M phosphate buffer, pH 7.8, and 10% glycerol was prepared for each P-1 probe before spotting. Twenty nano liter of the P1 working solution was spotted on a 7 × 7 mm 2 chip in an 8 per row × 6 per column format, by a BioDot PC controlled dispense arrayer AD3200. A duplicate set of P1 probes were spotted on a chip with a spotting pattern shown in Figure 3A. After the spotted chips were incubated in a humidity-controlled chamber for at least 2 hrs, the chips were washed with 0.1% SDS, water, and air dried. A standard operating procedure for genotyping SNPs on the printed biosensor chips was described previously [27]. An arrayed chip was assembled into a square well of a 96-well microtiter plate for hybridization. A ligation reaction was carried out in a microtiter plate well containing an arrayed chip. A reaction solution (100 µl) contained 100 femtomoles of each relevant PCR amplicon of the 10 CYP7A1 SNPs, 10 nM P-2 probe (one for each SNP) and 5 units of mutant Ampligase in a buffer of 20 mM Tris-HCl, pH 8.3, 25 mM KCl, 10 mM MgCl 2 , 0.5 mM NAD, 0.01% Triton X-100, and 5 mg/ml alkaline treated casein. The ligation reaction was incubated for 20 min at 60°C. 96 chips in a 96-well plate were processed simultaneously. After a stringent wash (3 times in 0.01 M NaOH at room temperature and 3 times in 0.1 × SSC), the chips were incubated with an antibiotin-horse radish peroxidase (HRP) conjugate (1 µg/ml in hybridization buffer) for 10 min, and the chips were rinsed with 0.1 × SSC. 100 µl of a precipitate-generating HRP substrate TMB (BioFx) was added to each chip and incubated for 5 min, rinsed in ddH 2 0, and air-dried.