Targeted oligonucleotide-mediated microsatellite identification (TOMMI) from large-insert library clones

Background In the last few years, microsatellites have become the most popular molecular marker system and have intensively been applied in genome mapping, biodiversity and phylogeny studies of livestock. Compared to single nucleotide polymorphism (SNP) as another popular marker system, microsatellites reveal obvious advantages. They are multi-allelic, possibly more polymorphic and cheaper to genotype. Calculations showed that a multi-allelic marker system always has more power to detect Linkage Disequilibrium (LD) than does a di-allelic marker system [1]. Traditional isolation methods using partial genomic libraries are time-consuming and cost-intensive. In order to directly generate microsatellites from large-insert libraries a sequencing approach with repeat-containing oligonucleotides is introduced. Results Seventeen porcine microsatellite markers were isolated from eleven PAC clones by targeted oligonucleotide-mediated microsatellite identification (TOMMI), an improved efficient and rapid flanking sequence-based approach for the isolation of STS-markers. With the application of TOMMI, an average of 1.55 (CA/GT) microsatellites per PAC clone was identified. The number of alleles, allele size distribution, polymorphism information content (PIC), average heterozygosity (HT), and effective allele number (NE) for the STS-markers were calculated using a sampling of 336 unrelated animals representing fifteen pig breeds (nine European and six Chinese breeds). Sixteen of the microsatellite markers proved to be polymorphic (2 to 22 alleles) in this heterogeneous sampling. Most of the publicly available (porcine) microsatellite amplicons range from approximately 80 bp to 200 bp. Here, we attempted to utilize as much sequence information as possible to develop STS-markers with larger amplicons. Indeed, fourteen of the seventeen STS-marker amplicons have minimal allele sizes of at least 200 bp. Thus, most of the generated STS-markers can easily be integrated into multilocus assays covering a broader separation spectrum. Linkage mapping results of the markers indicate their potential immediate use in QTL studies to further dissect trait associated chromosomal regions. Conclusion The sequencing strategy described in this study provides a targeted, inexpensive and fast method to develop microsatellites from large-insert libraries. It is well suited to generate polymorphic markers for selected chromosomal regions, contigs of overlapping clones and yields sufficient high quality sequence data to develop amplicons greater than 250 bases.


Background
Almost all of the applied protocols to isolate microsatellites de novo include construction of partial genomic libraries (selected for small insert size) followed by cumbersome screening steps with hybridization probes [2]. Here, we introduce an improved approach called TOMMI (Targeted Oligonucleotide-Mediated Microsatellite Identification) to develop microsatellites by straightforward sequencing of clones isolated from large-insert libraries like PAC (P1-derived Artificial Chromosome) and BAC (Bacterial Artificial Chromosome) with repeat-containing oligonucleotides. The need to specifically identify and isolate STS-markers from these types of libraries is unquestionable. First, large-insert libraries are predominantly used in animal genetics, e.g. [3,4], as tools to identify candidate genes or to generate overlapping contigs of chromosomal regions that are associated with quantitative or economic trait loci (QTL or ETL). Secondly, the overall number of microsatellites present in a genome depends mainly on their complexity and size. Assuming a total size of 3 × 10 9 bp and an estimated frequency of a dinucleotide repeat every 30-50 kb in mammals (as reviewed by [5]), a genome-wide figure of 100,000 microsatellite markers of that kind can be assumed [6]. However, only approximately 1,200 porcine microsatellites have been reported so far [7]. Furthermore, both the total number and the distribution of the loci are still not sufficient to have well-distributed microsatellite coverage throughout the genome or for several chromosomes, e.g. SSC18 [8]. The objective of the present study was the selective generation of micro-satellites from PAC-clones, which were prior to STS development isolated from the porcine PAC library TAIGP714 [3] by a three-dimensional PCR screening strategy [9]. Eight of the eleven clones harbored functional or positional candidate genes involved in health, reproduction, production, and regulation, whereas the other three clones have been used in the attempt to construct a PAC contig covering SSC16q11-13 (Table 1).

Results and discussion
Fifteen of the seventeen microsatellites (Table 2) were developed with sequencing primers containing one selective nucleotide at the 3'-end: (CA) 8 T (S0701, S0703, and S0767), (CA) 8 A (S0702, S0704, and S0710), (CA) 8 G (S0705, S0706, S0712, and S0766), (AC) 8 C (S0709), (AC) 8 G (S0707 and S0715), (AC) 8 T (S0708 and S0711). Characterization of microsatellites S0713 and S0714 was only accomplished by an improved discrimination of the PAC clone sequences with sequencing primers further extended at the 3'-end with a second nucleotide [(CA) 8 AT for S0713 and (CA) 8 GC for S0714]. The second nucleotide became necessary because the respective clones TAIGP714L02061Q (for S0713) and TAIGP714I23038Q (for S0714) contained additional (CA) 8 A or (CA) 8 G primer binding regions or motifs. Contrary, a further extension with three nucleotides at the 3'-ends of the primers did not result in additional microsatellites in any of the PAC clones or was not required. Therefore, we conclude that repeat primers with two 3'-nucleotides next to the repeat motif are sufficient to detect and sequence all  [30] repeats potentially present on a large-insert library clone. The results of our isolation strategy also indicate that two sequencing reactions (the reverse sequencing primer was designed based on the obtained sequences) seem to be sufficient in most cases to gain sequence information of high quality to amplify microsatellites ( Table 2). Usage of sequencing primers degenerated at the 3'-end proved, however, to be inadequate as no sequence information at all was achieved. Also, to avoid overlapping primary sequences, oligonucleotides that basically extend the dinucleotide repeat at the 3'-end -such as (CA) 8 C and (AC) 8 A -are not recommended. TOMMI proved to be an efficient and reliable isolation strategy. Besides new STSmarkers, six previously described microsatellites were also detected. Three of these loci, microsatellites S0111 [10], SW742 [11], and SW813 [12], were initially used as probes for the isolation of clones TAIGP714L02061Q, TAIGP714I23038Q, and TAIGP714F10061Q. The other three already described microsatellite sequences reside on TAIGP714C09004Q [GenBank: AJ440949 (repeat location: 3172-3231) and GenBank: AJ440950 (repeat location: 15831-15860 and 16007-16038)]. They were not further considered in this study as they were not regarded as novel. Independently of our effort, two other groups [13,14] introduced similar sequencing approaches to generate microsatellites from large-insert libraries. There are, however, several differences between our approach and the ones of the other groups in terms of sequence generation and selective amplification of microsatellites. Here, contrary to Waldbieser and colleagues [14] -who used tri-nucleotide repeat containing primers for sequencingboth gene-specific primers are not 5'-tailed with extra nucleotide stretches to enable either product labeling or to promote alleged non-template adenylation. Fujishima-Kanaya's group [13] used larger repeat compounds contributing to the primer [(CA/GT) (10) instead of (CA/ GT) (8) ]. Secondly, the sequencing primers consisted generally of three selective nucleotides at the 3'-end adjacent to the repeat motif (e.g. CNA/GVG). There, the first of the three terminal nucleotides was always identical with the starting nucleotide of the dinucleotide repeat primer used. In addition, primers contained a degenerated base according to the International Union of Biochemistry (IUB) codes at the second position from or directly at the 3'-end. Thirdly, determination of the double-stranded primary DNA sequence stretch was achieved by four sequencing reactions using both a CA-repeat containing primer plus a GT-repeat containing primer heading in the opposite direction and two reverse primers were developed based on the obtained sequence. Finally, they always designed an additional primer pair for the specific amplification of the microsatellite. In contrast, we used the single reverse sequencing primer in combination with a newly developed sequence specific primer (S0766 and S0767) or designed a new primer pair to amplify the microsatellite (S0701 to S0715).
The observed number of alleles per locus (monomorphic locus S0709 is not included in this calculation) in the heterogeneous sampling was as low as 2 (S0702) and as high   [17] as 22 (S0713), leading to an average number of 9.94 alleles, N E ranged from 1.05 to 11.54 and both H T and PIC from 0.05 to 0.91 (Table 3).
Due to their isolation from partial genomic libraries selected for small insert sizes most of the publicly available porcine microsatellites lie within DNA-fragments of about 80 to 200 bp. Their potential combination in multiplex assays -also considering different annealing temperatures and technical limitations of the automated sequencers (limited number of available fluorescent dyes) -is therefore hampered. Hence, an enhanced number of genotypes per run can only be achieved by the integration of STS-markers covering a larger allelic spectrum. Thus, we intended and focused on the development of large amplicons for microsatellites by utilizing as much sequence information as possible for primer design. Indeed, fourteen STS-markers had allele sizes of at least 200 bp and for five of the isolated microsatellites, sequence information proved to be good enough to amplify allele sizes of at least 300 bp (Table 3).
By the guided isolation of STS-markers S0709 to S0715 from three SSC16q derived PAC clones (relative position 0 cM to 9.3 cM [7]; 2.33 STS-markers per clone), the marker density in this chromosomal region was improved remarkably. An average of 1.55 new microsatellites was isolated from PAC clones harboring functional candidate genes (S0701-S0708; S0766 and S0767). Considering all used PAC clones and developed STS-markers, 1.55 microsatellites per clone were isolated. As the PAC clones had an average length of 80 kb (as shown by pulsed-field-gel electrophoresis) the frequency of dinucleotide repeats every (30 to) 50 kb [5] was more or less confirmed. TOMMI holds therefore the potential to identify existing STS-markers linked/adjacent to e.g. candidate genes on large-insert library clones. Thus, in combination with a genome scan, respective putative candidate genes could either be transformed to or excluded as positional candidate genes prior to their complete structural characterization including SNP detection. Linkage mapping results for S0701, S0705, S0707, S0711, S0712, S0713, S0715, and S0766 are presented in Table 4. A comparison of their mapping positions with QTL positions (Pig Quantitative Trait Loci (QTL) database [15] reveal that S0705 (64.22 cM), S0707 (43.19 cM), and S0766 (102.50 cM) reside on the respective chromosomes exactly at QTL locations (S0705: backfat between the last 3 th and 4 th rib; S0707: early growth rate and water holding capacity; S0766: backfat thickness at first rib and intra-muscular fat). The other STS-markers are located in QTL spans of ± 5 cM. This indicates their immediate potential to further dissect these respective QTL regions.

Conclusion
The sequencing strategy described in this study provides a targeted, inexpensive and fast method to develop microsatellites from large-insert libraries. It is also well suited to generate polymorphic markers for selected chromosomal regions and contigs of overlapping clones and yielded sufficient high quality sequence data to develop marker amplicons greater than 250 bases.

PAC clone isolation and physical mapping
Prior to STS development, a total of 11 clones were isolated from the porcine PAC library TAIGP714 [3] by a three-dimensional PCR screening strategy. PAC-DNA preparations were done according to the manufacturer's protocol (Qiagen, Hilden, Germany). The physical assignment of the PAC clones was performed by Fluorescence in situ Hybridization (FISH) as described in [16] or alternatively by analysis of the INRA-UMN porcine radiation hybrid (IMpRH) panel [17]. Microsatellite primers (Table  3) were used to RH map S0703, S0704 and S0708 -S0715. Marker assignment of S0701, S0702, S0705 -S0707, S0766 and S0767 was performed with primers from further sequence segments of the PAC clones.

Microsatellite generation and characterization
All sequencing reactions and the separation of microsatellites were performed on an ABI PRISM ® 3100 DNA ana- lyzer (ABI, Weiterstadt, Germany). Sequencing reactions were done using the BigDye™ Terminator (v 3.0) Cycle Sequencing Kit (ABI, Weiterstadt, Germany). DNA sequencing was performed using 10 pmol of the respective oligonucleotide, 1 µl BigDye Premix and 50-100 ng of purified plasmid DNA as template in a total volume of 10 µl. Sequencing conditions were 96°C for 30 s followed by 30 cycles of 96°C for 10 s, the respective annealing temperature for 5 s and 60°C for 4 min. The optimal annealing temperature for the repeat containing primer was between 50°C and 52°C, except for the generation of sequences for S0714, which were at 56°C. To generate STS-markers, oligonucleotides containing repeat motifs (CA) 8 respectively (AC) 8 at the 5'-end and few (one or two) non-repetitive bases at the 3'-end were originally used as sequencing primers. Based on the obtained sequence, specific primers were developed and used as reverse oligonucleotides to determine the composition of the repeat region and its 5'-flanking region ( Table 2; Figure 1). BLAST comparison followed sequence determination to verify the novelty and uniqueness of the obtained sequences. Depending on the quality of the sequenced stretch, primers were developed to amplify seventeen STSmarkers (S0701 to S0715; S0766 and S0767; Table 3). To confirm the sequence identity of the respective microsatellites [GenBank: AY253989 to AY254003, AY731063, and AY731064] on genomic DNA, the resulting PCR products were subcloned into the polylinker of the pGEM ® -T vector (Promega, Mannheim, Germany) and three independent clones each were bi-directionally sequenced using standard sequencing primers SP6 (5'-ATT TAG GTG ACA CTA TAG AA-3') and T7 (5'-TAA TAC GAC TCA CTA TAG GG-3').

Linkage mapping of STS-markers on the USDA-MARC linkage map
Seven families of the MARC Swine Reference Population were genotyped as described [22]. Amplified DNA was radioactively labeled, separated by denaturing polyacrylamide gel electrophoresis and visualized with autoradiography. To ensure accurate sizing and discrimination of alleles, amplification primers were redesigned to yield smaller products for all markers except S0706, S0707 and S0709. S0767 was not tested in this population. Four markers were not informative in the MARC Swine Reference Population (S0702, S0706, S0709 and S0714) and four primer sets failed to produce reliable products (S0703, S0704, S0708 and S0710). Genotypes were determined and entered into the MARC Genome Database. Each marker was initially assigned to a chromosome based on TWOPOINT results of CRIMAP [23], then multipoint linkage analyses determined the final location of each marker. Genotypic data were evaluated with CHROMPIC and corrections made if necessary. The final position reported is based on the current MARC swine linkage map. Amplification primers for the eight successfully mapped markers are presented in Table 4.

Authors' contributions
KFC conducted the lab work to isolate and characterize S0701 to S0715 and CK to isolate and characterize S0766 and S0767. CK shared manuscript preparation and editing with KFC, supervised KFC's Ph.D. thesis, evaluated microsatellite data, and organized and provided DNA of the European pig breeds. KBK optimized and conducted fragment analysis and was responsible for evaluation of microsatellite data. JR assisted KFC in the beginning of the project. LSH organized DNA of the Chinese pig breeds. GAR conducted linkage mapping of the markers and edited the manuscript. BB proposed the idea, supervised and commented on the project, was responsible for funding and manuscript editing, and acts as head of the research group in Göttingen.