Prion gene haplotypes of U.S. cattle

Background Bovine spongiform encephalopathy (BSE) is a fatal neurological disorder characterized by abnormal deposits of a protease-resistant isoform of the prion protein. Characterizing linkage disequilibrium (LD) and haplotype networks within the bovine prion gene (PRNP) is important for 1) testing rare or common PRNP variation for an association with BSE and 2) interpreting any association of PRNP alleles with BSE susceptibility. The objective of this study was to identify polymorphisms and haplotypes within PRNP from the promoter region through the 3'UTR in a diverse sample of U.S. cattle genomes. Results A 25.2-kb genomic region containing PRNP was sequenced from 192 diverse U.S. beef and dairy cattle. Sequence analyses identified 388 total polymorphisms, of which 287 have not previously been reported. The polymorphism alleles define PRNP by regions of high and low LD. High LD is present between alleles in the promoter region through exon 2 (6.7 kb). PRNP alleles within the majority of intron 2, the entire coding sequence and the untranslated region of exon 3 are in low LD (18.0 kb). Two haplotype networks, one representing the region of high LD and the other the region of low LD yielded nineteen different combinations that represent haplotypes spanning PRNP. The haplotype combinations are tagged by 19 polymorphisms (htSNPS) which characterize variation within and across PRNP. Conclusion The number of polymorphisms in the prion gene region of U.S. cattle is nearly four times greater than previously described. These polymorphisms define PRNP haplotypes that may influence BSE susceptibility in cattle.


Background
Transmissible spongiform encephalopathies (TSEs) have been identified in humans, sheep, goats, deer, elk, moose, cattle, cats, and mink [1]. A cattle TSE, bovine spongiform encephalopathy (BSE), was first diagnosed among Holstein/Friesian cattle in the United Kingdom [2] and has since been detected in at least twenty five countries including the United States. The BSE agent is the probable cause of the human TSE, variant Creutzfeldt-Jakob Disease (vCJD) [3,4], transmitted from cattle to people via the food chain.
Variation in the prion gene (PRNP) correlates with TSE progression in humans [5,6], sheep [7], and mice [8]. In cattle, a 23-bp insertion/deletion (indel) polymorphism in the putative promoter region and a 12-bp indel within intron I have been associated with German BSE-affected animals [9]. These polymorphisms are present in U.S. cattle [10]. However, most of PRNP has not been characterized in a population as diverse as U.S. cattle outside of the coding region and 3'UTR of exon III, and portions of the promoter and intron I [10][11][12]. Consequently, the extent of PRNP polymorphisms, linkage between PRNP alleles, recombination events, and haplotype diversity within PRNP is not known.
Public health concerns associated with vCJD and economic impacts of BSE on the cattle industry worldwide compel a thorough characterization of the genetic variation of bovine PRNP. Single nucleotide polymorphism (SNP) discovery in small populations introduce ascertainment biases of SNP properties [13,14], and partial sequencing of genes in deep populations may characterize haplotype networks that extend past the sequenced region [15], yet still miss significant variation within the gene. The aim of this study was to characterize the extent of linkage disequilibrium (LD) and haplotype networks within PRNP ranging from the promoter past the 3'UTR (25.2 kb) in 192 U.S. cattle (16 beef and five dairy breeds). Reported here are 287 newly identified PRNP polymorphisms, the frequencies of 388 PRNP polymorphisms in U.S. beef and dairy cattle, a reference map of LD and haplotypes throughout PRNP, and the identification of 19 haplotype tagging SNPs (htSNPs) that are effective in U.S. populations of cattle. These results provide a reference framework for accurate and comprehensive evaluation of PRNP variation and its relationship to BSE.

Amplification and sequence coverage of the bovine PRNP gene from the promoter region through the 3'UTR (25.2 kb) in U.S. cattle
The PRNP gene was sequenced in 192 beef and dairy cattle; 16 beef and five dairy breeds from 24 overlapping amplicons ( Figure 1 and see additional file 1). The amplification primers do not hybridize with PRNP regions containing any of the polymorphisms observed in this study, nor do any of the 150 sequencing primers used for redundant coverage of PRNP nucleotides (see additional file 1). Two or more high quality or unambiguous heterozygous reads were obtained for each PRNP nucleotide throughout 24.8 kb of the 25.2 kb region for approximately 95% of the cattle. Regions of PRNP that correspond with ambiguous sequence from more than five percent of the 192 animals were identified in the promoter region (95 bp), intron 1 (96 bp), and intron 2 (225 bp). These regions are attributable to closely positioned indels with hetero-zygous genotypes of high frequency in the cattle populations or stretches of mononucleotide repeats, both of which interfere with collection of high quality sequence. The positioning of these problematic loci was such that it was not possible to design amplification/sequencing reactions to cover the areas with high quality sequence.
PRNP polymorphisms in U.S. cattle A total of 388 polymorphisms (351 SNPs and 37 indels) were observed in PRNP gene sequences from the 384 chromosomes present in all 192 cattle ( Figure 2 and see additional file 2). Two hundred and eighty-seven of the polymorphisms were not described in GenBank and literature searches as of July 10, 2006, and all were identified in non-coding regions of PRNP. The majority of polymorphisms (382/388) were observed in the multi-breed beef diversity panel (17 breeds, 192 chromosomes). In contrast, 158 polymorphisms were observed in the multibreed dairy diversity panel (five breeds, 192 chromosomes), of which six were unique to the panel. Polymorphisms were observed in subgroups of cattle as follows ( Figure 3): Bos taurus (240/388), British (161/388), Continental (216/388), Composites of U.S. Brahman (331/ 388), and Holstein (137/388). A total of 261 polymorphisms were used for haplotype inference across the five subgroups of cattle. One hundred and twenty polymorphisms were excluded from haplotype inference in all five subgroups of cattle due to low minor allele frequencies and an additional seven were excluded by Hardy-Weinberg testing. Sixty three of the 388 polymorphisms were only observed in one animal of the multi-breed beef diversity panel, a composite of U.S. Brahman.
Linkage disequilibrium of PRNP alleles A 6.7-kb region of high LD was identified in U.S. beef and dairy cattle from the 5' promoter region through exon 1, intron 1, exon 2, and part of intron 2 ( Figure 4). Adjacent to the high LD region, an 18.0-kb portion of PRNP, containing the majority of intron 2 and all of exon 3, displayed markedly less LD (region of low LD). The region of high LD was not restricted to a cattle subgroup and was observed in the beef cattle diversity panel; dairy cattle diversity panel, B. taurus, British, Continental, and Holstein cattle subgroups (minor allele frequency ≥ 0.05, Hardy Weinberg p > 0.01). Including the polymorphisms with minor alleles of low frequency, 115 polymorphisms were identified in the 6.7 region of high LD, of which 45 have alleles in LD.

Haplotype inference and networks
Nineteen PRNP haplotypes were inferred by the Expected Maximization (EM) algorithm on at least four chromosomes in one or more of the following subgroups: B. taurus, British, Continental, Holstein, and U.S. Brahman composite. Six of the haplotypes were observed in a Sequence coverage of PRNP in U.S. beef and dairy cattle   Figure 5, Table 1). Because the haplotypes span two distinct PRNP regions defined by high and low LD, a Median-Joining network for each region was constructed ( Figure 6). The network within the 6.7-kb region of high LD contains "sub-haplotypes" phased from nine of the 19 htSNPs and shows a linear stepwise relationship of alleles ( Figure 6; network 1, Table 1). The network within the PRNP region of low LD contains sub-haplotypes phased from the remaining ten htSNPs and has a distinctly looped structure, indicating multiple unresolved allele relationships ( Figure 6; network 2, Table 1). Sub-haplotype combinations from the two networks effectively account for the regions of high and low LD and yield haplotypes that span PRNP ( Figure 6).

Discussion
Sequencing the PRNP gene in 192 cattle representing 21 breeds resulted in the identification of 388 polymorphisms and detection of a region of high LD in the 5' noncoding region of PRNP. We identified 19 common PRNP haplotypes in U.S. cattle and characterized 19 htSNPs with the power to monitor these haplotypes. These results provide the means and a context for testing PRNP variation for an association with BSE.
Cattle present a challenge for LD analysis due to their complex history.  [17]. Consequently, alleles with high pairwise r 2 values predict linear haplotype networks, and alleles with low pairwise r 2 values predict complex or looped haplotype networks. The region of high LD that encompasses a 6.7-kb portion of PRNP extends to the 5' boundary of the PRNP locus sequenced in this study. Additional high LD may extend upstream of the sequenced locus. PRNP SNP alleles within this region are highly correlated with each other, and haplotypes phased within the region yield a linear network. In contrast, 18.0 kb of PRNP, which includes the entire protein coding and untranslated region of exon 3, has alleles in low LD and the associated haplotype network is complex. Alleles in this region may not be correlated with alleles elsewhere on the chromosome.
A previous study implicated a 23-bp indel in the promoter region and a 12-bp indel within intron 1 of PRNP with an association with susceptibility to BSE [9]. Both of these polymorphisms lie within the region of high LD described here, and their alleles are strongly correlated with the alle-    les of 43 other polymorphisms detected in this study. Until both chromosomal boundaries flanking the region of LD are determined, the number of polymorphisms with alleles in LD with those associated with BSE is unknown.
Although the diversity panels used to sequence PRNP represent a broad sample of U.S. cattle, it is likely that additional diversity within PRNP is present at low frequency in U.S. herds. This hypothesis is supported by the PRNP sequence from a single animal that accounted for 16.2 % of all observed polymorphisms. Some countries, including the U.S., have detected atypical BSE cases at exceedingly low frequencies with increased surveillance [18][19][20]. Atypical BSE can differ from typical BSE by brain distribution and plaque morphology of the protease-resistant prion isoform (PrP res ), or by western immunoblot profile of PrP res following proteinase K digestion [18][19][20]. Diverse chromosomes in cattle populations could confound interpretations of PRNP variation identified from individual cases of either typical or atypical BSE.

Conclusion
The number of polymorphisms in the prion gene region of U.S. cattle is nearly four times greater than previously described. PRNP is divided into regions of high and low LD. The 19 htSNPs identified in this study define haplotype combinations from the two PRNP regions that may influence BSE susceptibility in cattle.  Table 1.

Primer design, PCR, and cycle sequencing
Reference sequence for the bovine PRNP gene was used as a template for primer design [GenBank:AJ298878], [23,24]. Primers were designed to amplify 24 overlapping amplicons that collectively span a 25.2-kb region of PRNP (Oligo 6.61). Nested sequencing primers were designed for each amplicon to provide nucleotide coverage in both directions. Following preliminary experiments of amplification primer performance, the 192 cattle genomes comprising MBCDP2.1 and MDCP1.5 were subjected to 40 rounds of PCR with conditions as described [25] (see additional file 1). Following an Exonuclease I digestion [26], the PRNP amplicons were sequenced with BigDye terminator chemistry on an ABI 3730 capillary sequencer (PE Applied Biosystems, Foster City, CA).

Polymorphism detection, sequence quality, and coverage
Sequences from the 192 animals of the multi-breed beef and dairy panels were processed for polymorphism detection with Phred and Phrap [27,28], Polyphred 3.5 [29], and Consed software [30]. A physical map linked to the PRNP consensus sequence and the location of polymorphisms was constructed in Vector NTI (v7.1). The map was annotated with all amplification and sequencing primers connected with the PRNP sequence. Replacement primers for those that hybridized to genomic loci containing polymorphisms were designed and used for additional amplification and sequencing of PRNP regions. PRNP nucleotide sequence with a phred score greater than 20 from at least two sequencing reads from the same animal was mapped to the corresponding nucleotide on reference sequence [GenBank:AJ298878]. Sequence compromised by SNP loci under associated amplification or sequencing primers was not analyzed for sequence coverage or the determination of genotypes. Regions reflecting poor sequence quality (<95% animal coverage) were identified, additional amplicons and sequencing primers were designed, and additional sequencing was performed. PRNP allele genotypes were mapped to reference sequence [GenBank:AJ298878] and stored in a relational database. A file of PRNP sequence annotated with all polymorphisms observed in this study and their frequencies in the beef diversity panel (MBCDP2.1) and dairy diversity panel (MDCP1.5) has been deposited in GenBank [GenBank:DQ457195].

Definitions of animal subgroups
The 192 beef and dairy animals whose genomic DNA comprise diversity panels MBCDP2.1 and MDCP1.5 were sorted into five subgroups based on breed composition. The

LD estimation, haplotype inference, and median-joining network analyses
Unphased PRNP genotypes were assembled for each animal in datasets of the B. taurus, British, Continental, Holstein, and Composite of U.S. Brahman subgroups. Polymorphisms with more than two alleles, a minor allele frequency <0.05, or those not in Hardy-Weinberg equilibrium (Chi-square p < 0.01) were excluded from further analyses. Our cattle populations were not the result of random mating, violating an assumption of Hardy-Weinberg equilibrium. However, the Hardy-Weinberg test facilitated the identification of common haplotypes within the subgroups by excluding polymorphisms where the minor allele was amplified in a particular breed, yet had a low overall frequency.
The extent of LD between the PRNP alleles of each dataset was calculated with pairwise r 2 values (Haploview v3.2 [31]). Regions of LD were determined through visual inspection of LD graphs. Haplotypes were inferred in Haploview using the EM algorithm and a minimal set of polymorphisms was identified that collectively tagged all observed haplotypes predicted on four or more chromosomes in one or more of the five subgroups. The htSNPs identified across the five subgroup datasets were com-bined into a single set of 19 htSNPS. The 19 htSNPs were used to infer haplotypes within the five subgroup datasets. Median-Joining networks of PRNP haplotypes were constructed in Network (v4.111) [32].