Development of an integrative database with 499 novel microsatellite markers for Macaca fascicularis

Background Cynomolgus macaques (Macaca fascicularis) are a valuable resource for linkage studies of genetic disorders, but their microsatellite markers are not sufficient. In genetic studies, a prerequisite for mapping genes is development of a genome-wide set of microsatellite markers in target organisms. A whole genome sequence and its annotation also facilitate identification of markers for causative mutations. The aim of this study is to establish hundreds of microsatellite markers and to develop an integrative cynomolgus macaque genome database with a variety of datasets including marker and gene information that will be useful for further genetic analyses in this species. Results We investigated the level of polymorphisms in cynomolgus monkeys for 671 microsatellite markers that are covered by our established Bacterial Artificial Chromosome (BAC) clones. Four hundred and ninety-nine (74.4%) of the markers were found to be polymorphic using standard PCR analysis. The average number of alleles and average expected heterozygosity at these polymorphic loci in ten cynomolgus macaques were 8.20 and 0.75, respectively. Conclusion BAC clones and novel microsatellite markers were assigned to the rhesus genome sequence and linked with our cynomolgus macaque cDNA database (QFbase). Our novel microsatellite marker set and genomic database will be valuable integrative resources in analyzing genetic disorders in cynomolgus macaques.


Background
Cynomolgus macaques (Macaca fascicularis) are one of the most commonly used nonhuman primates in biomedical research. Currently, about two thousand cynomolgus macaques are maintained in Tsukuba Primate Research Center (TPRC), Japan [1]. Several lineages of the captive cynomolgus macaques have genetic disorders such as macular degeneration [2] and endometriosis [3]. In genetic studies, a prerequisite for mapping genes is development of a genome-wide set of microsatellite markers in target organisms. A whole genome sequence and its annotation also facilitate identification of markers for causative mutations. A comprehensive cynomolgus macaque genome database, including a map of Bacterial Artificial Chromosome (BAC) clones, 5'-end expressed sequence tags (ESTs), microsatellite markers, primer sequences for microsatellite markers, and genes around the microsatellite markers would be valuable for linkage analyses, but, unfortunately, complete genome of cynomolgus macaque is not yet sequenced.
A microsatellite marker set is a versatile tool that would assist in colony management, conservation work, and paternity testing of nonhuman primates [4][5][6][7][8][9][10][11][12]. Microsatellite markers of human [13] and some nonhuman primate species [14][15][16][17] are now widely available, facilitating linkage analyses in these species. The first generation of genetic linkage maps of baboons [18,19] and rhesus macaques were developed by Rogers, et al. [20]. However, few studies have been conducted on microsatellite markers in cynomolgus macaques [12,21]. In this study, we established 499 microsatellite markers that were covered by pre-identified Bacterial Artificial Chromosome (BAC) clones for cynomolgus macaques. We also developed an integrative cynomolgus macaque genome database with a variety of datasets including marker and gene information that will be useful for further genetic analyses in this species. Advantages of this study are (1) since most of newly developed microsatellite marker loci were covered by the BAC clones, we could search for their chromosomal locations by in silico mapping, (2) these microsatellite markers were mapped to the rhesus macaque genome sequence http://genebank.nibio.go.jp/cgi-bin/gbrowse/rheMac2/, and (3) the 499 novel markers established in this study outnumber the previously reported microsatellite markers in other macaques and are probably useful for linkage studies in other non-human primate species as well. At a genome-wide level, the cynomolgus and rhesus macaque genomes are very similar; their genetic divergence is about 0.4% at a nucleotide level [22]. In addition, their karyotypes are also very similar [23]. To design cynomolgus macaque microsatellite markers based on the rhesus macaque genome sequence would be a reasonable and efficient way of establishing a species-specific genomic conformation for this species. The development of a linkage map in this species is a first step toward exploring the genes responsible for genetic disorders in captive macaques.

Identification of polymorphic microsatellite and construction of microsatellite marker database for cynomolgus macaque
BAC-end sequences of 768 clones of a cynomolgus macaque were determined. Of these, 487 BAC clones were successfully mapped onto the draft rhesus genome sequence (see method). Within the regions that were covered by the BAC clones, we selected 671 candidate loci from 394 BAC clones that harbor dinucleotide repeats equal or longer than 20 bp in the rhesus genome sequence. Of these, 34 markers were selected from rhesus macaque or human microsatellite markers identified by previous studies [13,[24][25][26][27][28]. Our marker set does not contain the markers previously developed by Kikuchi et al. [21]. These primer sequences and their genomic locations are presented in Additional file 1.
Next, we investigated whether these candidate repeats for microsatellite markers are polymorphic using 10 unrelated cynomolgus macaque individuals from Indonesia, Malaysia, and the Philippines. Of the 671 microsatellite markers tested, 499 (74.4%) gave rise to polymorphic PCR products, approximately the same size as expected from the rhesus or human genome sequence. The detailed information is presented in Additional file 1 and also on our website http://genebank.nibio.go.jp/cgi-bin/qfbase/ macMMarker.cgi/. Because some of the microsatellite markers located on very close loci, which were covered by single BAC clone, we estimated the coverage of the genome by the microsatellite makers using only one polymorphic microsatellite marker which have the distance at least 0.1 Mbp between neighboring markers. The average distance between newly developed markers was about 10 cM, assuming that the macaque genome comprised 3000 Mbp of nucleotides. PCR product sizes are 63-647 bp with an average size of 247 bp. The average number of alleles per polymorphic marker was 8.20 (range 2-17) and the average expected heterozygosity was 0.75 (range 0.10-0.94) for these 499 markers in the cynomolgus macaques. The distribution of expected heterozygosity values showed that a substantial number of the markers have expected heterozygosity greater than 0.80. Microsatellite markers with expected heterozygosity > 0.75 are regarded as highly polymorphic [20]. According to this criterion, 324 of 499 markers (64.9%) were highly polymorphic ( Figure 1). In order to check the mode of inheritance of these markers, we investigated additional four families consisting of 27 animals to confirm the inheritance of 453 autosomal markers and found that 412 markers showed no contradictions concerning Mendelian inheritance (see Additional file 2).
We investigated the distribution of the microsatellite markers on the human and rhesus chromosomes. As shown in Table 1, the novel microsatellite markers were distributed over all autosomes and X-chromosome of both species. Since the draft genome sequence of the rhesus Y-chromosome is not available, we did not obtain microsatellite markers on the Y-chromosome.

Construction of integrative cynomolgus macaque genome database
We further constructed a genome browser for cynomolgus macaques, based on the rhesus genome sequence. In this database, users can search the positions of cynomolgus macaque BAC clones, 5'-end expressed sequence tags (ESTs) and microsatellite markers on the rhesus macaque genome. In addition, human and macaque cDNA sequences and rhesus macaque genes predicted by Ensembl [29] were aligned on the genome. This database is also connected to QFbase, which contains data of more than 130,000 cynomolgus macaque cDNAs [22]. This information will facilitate the search for known or predicted cynomolgus macaque genes near the microsatellite markers and help to narrow down candidate regions for functional genes near these markers identified by linkage analysis. The address of the database is http:// genebank.nibio.go.jp/cgi-bin/gbrowse/rheMac2/.

Discussion
At present, a few microsatellite markers have been reported for cynomolgus macaques [12,21] even though the microsatellite tool would be valuable for studying cynomolgus macaque genetic diseases. In this work, we have designed 637 primer pairs and selected 34 primer pairs from NCBI/UniSTS. Of the 671 marker candidates, 499 polymorphic microsatellite markers located on cynomolgus macaque BAC clones. These polymorphic markers could be amplified with high probability with the same protocol as for human microsatellite markers used for other macaques [16,20]. Most of the developed microsatellite markers have high expected heterozygosity and are distributed throughout the human and rhesus macaque chromosomes. Our microsatellite marker set will be available for various studies in macaques, and macaque chromosomal information of developed microsatellite markers is also available from the following database http://genebank.nibio.go.jp/cgi-bin/gbrowse/ rheMac2/. These BAC clones will be helpful for further identification and functional analysis of genes implicated in genetic disorders.
In the genome database of cynomolgus macaques, we assumed that the synteny of the cynomolgus and rhesus macaques is highly conserved. Although previous studies suggested that their chromosomes were highly similar at a microscopic level, smaller translocations, insertions or deletions may exist between the two genomes [22,23]. The BAC resource can be used to verify such genomic differences by fluorescent in situ hybridization (FISH) method. We verified the suitability of such mapping by FISH with 12 BAC clones. Although mapping all BAC clones by FISH is not a practical approach, we are able to check the synteny between the two genomes when a particularly interesting locus is found by further studies (see Additional file 3).
In the polymorphism analysis, all PCR products showed high fluorescence intensities, indicating the presence of ample labeled PCR product, even without optimization. Optimization of annealing temperature and magnesium concentration for the unsuccessful markers using cynomolgus macaque DNA would certainly yield additional useful markers. In addition, many of the microsatellite polymorphisms reported here will also be useful in other macaques [30,31].
Currently, about 800 human microsatellite markers, which cover all areas of the human genome with intervals of 5 cM, are commercially available. Rogers et al. developed the first generation of genetic linkage maps of baboons [18,19] and rhesus macaques [20], primarily consisting of human microsatellite loci amplified using the published human PCR primers. Human markers have been tested in the baboons, and over 280 microsatellites were used in studies for osteoporosis in this species [32,33]. The 499 novel markers established in this study outnumber the previously reported microsatellite markers in baboons and rhesus monkeys and are probably useful for linkage studies in other non-human primate species as well. These microsatellite markers are also valuable resources for the management of captive macaque colonies.
We are currently using linkage analysis to identify genetic loci implicated in hereditary macular degeneration in cynomolgus macaques, which is the only animal model of human age-related macular degeneration. Early onset macular degeneration occurred spontaneously in certain Distribution of heterozygosity for the 499 microsatellite markers cynomolgus macaque families at the Tsukuba Primate Research Center (TPRC), and family analysis revealed that this disease is controlled by autosomal dominant genes [2]. The integration of various genetic tools including our database would greatly facilitate the genetic-based research on disease models in the future. We should note that, however, the whole genome association studies, especially for candidate genes with only minor effects, might require a far denser map than that reported here.

Conclusion
Cynomolgus macaques (Macaca fascicularis) are one of the most commonly used nonhuman primates in biomedical research. In this study, we established 499 microsatellite markers for cynomolgus macaques. We also developed an integrative cynomolgus macaque genome database with a variety of datasets that will be useful for further genetic analyses in this species. The development of a linkage map in this species is a first step toward exploring the genes responsible for genetic disorders in captive macaques. These datasets are definitely valuable to many researchers who are in the field of primate genetics.

DNA sampling and pedigree structure
Whole blood samples were obtained from 37 pedigreed cynomolgus macaques, aged 3-29 years, consisting of 17 males and 20 females housed at the TPRC, National Insti-tute of Biomedical Innovation (NIBIO), Tsukuba, Japan. Blood samples of 10 unrelated individuals (four males and six females) were used for the polymorphism analysis. They consist of three Indonesian, four Philippine, and three Malaysian cynomolgus macaques. Blood samples of 27 individuals in four families (13 males and 14 females) from Malaysian cynomolgus macaques were used for the inheritance analysis. Genomic DNA was isolated from 10 ml of heparinized peripheral blood using the Wizard Genomic DNA purification kit (Promega, WI, USA). These macaques were cared for and handled according to guidelines established by the Institutional Animal Care and Use Committee of the NIBIO and the standard operating procedures for macaques at the TPRC. Collection of the blood was conducted in accordance with all guidelines required in the Laboratory Biosafety Manual, World Health Organization at the TPRC.

BAC library and BAC-end sequence
We used a BAC library that was constructed using DNA from renal cells of cynomolgus macaques. The library consists of approximately 110,000 recombinant BAC clones providing 3.4-fold coverage of the cynomolgus macaque genome. The cynomolgus macaque BAC library was obtained from the Department of Biomedical Resources, National Institute of Biomedical Innovation, Osaka, Japan. DNA sequencing was performed with BigDye Terminator v3.1 Ready Reaction Mix and ABI Prism-Avant Genetic Analyzer (Applied Biosystems, CA, USA).

In silico mapping of the BAC-end sequences on the rhesus macaquegenome
The BAC-end sequences were mapped onto the draft genome sequence of rhesus macaques (rheMac2 assembly) using a BLAST program (E = 10 -30 ). Repeat sequences were masked before the BLAST search. When the two BACend sequences from the same BAC clone were aligned as head-to-head directions within the range of 10-300 kb on the rhesus genome draft sequences, we assumed that the BAC clone was correctly assigned to the genome. In order to choose candidate loci for microsatellite markers within the region, we surveyed short tandem repeats (STRs) spanning at least 20 nucleotides, with motif length 2 (i.e., CACACA-nucleotide repeats), which was identified using Tandem repeats finder software [34].

Primer design and PCR
For 637 microsatellite loci on sequenced BAC clones, paired primers were designed using DNASIS software (Hitachi Software Engineering, Tokyo, Japan). The primer sets were confirmed not to match more than one region of the rhesus genome draft sequence [35]. MFA0028-0061 markers were selected from the NCBI/UniSTS id: markers and mapping data are 72106, 147912, 8379, 63879,  [13,[24][25][26][27][28]. The forward primers were labeled with one of three fluorescent dyes, 6-FAM, HEX, or NED. Amplification was done in a 384-well format on the 9700 Thermal Cyclers (Applied Biosystems, CA, USA). Ten microliters of the reaction mixture contained 10 ng of genomic DNA, 2.5 nmol each of dATP, dCTP, dGTP, and dTTP, 0.25 units of ExTaq, 5.0 pmol of forward and reverse primers, and the manufacturer's PCR buffer (all purchased from Takara Biosystems, Otsu, Japan). Cycling conditions are as follows: an initial denaturation was performed at 94°C for 5 min, 30 cycles of amplification were performed at 94°C for 1 min, 55°C for 1 min, and 72°C for 1 min, and one cycle of extension was performed at 72°C for 7 min. The same PCR conditions were applied to all amplifications.

Analyses of microsatellite polymorphisms and genetic inheritances
Amplified PCR products were mixed with gel-loading cocktails containing deionized formamide and labeled size standards (GeneMapper-LIZ 500; Applied Biosystems, CA, USA). Samples were run on the Applied Biosystems 3730 DNA analyzer (Applied Biosystems, CA, USA). Expected heterozygosity (H e ) of loci was calculated using the formula: H e = 1 -∑p 2 i [36]. We examined expected heterozygosity according to Nei [37]. Using 453 autosomal markers, genetic inheritance analyses were performed by checkfam [38].

Database construction
The cynomolgus macaque genome database was constructed using the rhesus genome sequence (rheMac2 assembly) as a reference. Information on microsatellite markers, BAC clones, 130,000 5'-end expressed sequence tags (ESTs) and annotation of genes was visualized with Generic Genome Browser (GBrowse) software [39]. The annotation of human and macaque transcripts on the genome was retrieved from the UCSC genome browser [40].