Genome-wide DNA polymorphisms in two cultivars of mei (Prunus mume sieb. et zucc.)
© Sun et al.; licensee BioMed Central Ltd. 2013
Received: 1 February 2013
Accepted: 25 September 2013
Published: 6 October 2013
Skip to main content
© Sun et al.; licensee BioMed Central Ltd. 2013
Received: 1 February 2013
Accepted: 25 September 2013
Published: 6 October 2013
Mei (Prunus mume Sieb. et Zucc.) is a famous ornamental plant and fruit crop grown in East Asian countries. Limited genetic resources, especially molecular markers, have hindered the progress of mei breeding projects. Here, we performed low-depth whole-genome sequencing of Prunus mume ‘Fenban’ and Prunus mume ‘Kouzi Yudie’ to identify high-quality polymorphic markers between the two cultivars on a large scale.
A total of 1464.1 Mb and 1422.1 Mb of ‘Fenban’ and ‘Kouzi Yudie’ sequencing data were uniquely mapped to the mei reference genome with about 6-fold coverage, respectively. We detected a large number of putative polymorphic markers from the 196.9 Mb of sequencing data shared by the two cultivars, which together contained 200,627 SNPs, 4,900 InDels, and 7,063 SSRs. Among these markers, 38,773 SNPs, 174 InDels, and 418 SSRs were distributed in the 22.4 Mb CDS region, and 63.0% of these marker-containing CDS sequences were assigned to GO terms. Subsequently, 670 selected SNPs were validated using an Agilent’s SureSelect solution phase hybridization assay. A subset of 599 SNPs was used to assess the genetic similarity of a panel of mei germplasm samples and a plum (P. salicina) cultivar, producing a set of informative diversity data. We also analyzed the frequency and distribution of detected InDels and SSRs in mei genome and validated their usefulness as DNA markers. These markers were successfully amplified in the cultivars and in their segregating progeny.
A large set of high-quality polymorphic SNPs, InDels, and SSRs were identified in parallel between ‘Fenban’ and ‘Kouzi Yudie’ using low-depth whole-genome sequencing. The study presents extensive data on these polymorphic markers, which can be useful for constructing high-resolution genetic maps, performing genome-wide association studies, and designing genomic selection strategies in mei.
Mei (Prunus mume Sieb. et Zucc., 2n=2x=16) is a member of Rosaceae, sub-family Prunoideae . It originated in southwestern China, and has been cultivated in China for more than 3000 years . Presently, it is also widely cultivated in other East Asian countries such as Japan and Korea [1, 2]. Mei blossoms possess many conspicuous ornamental characteristics, such as vibrantly colored corollas and various types of flowers. Mei is characterized by an inherent tolerance to low temperatures (−4 to −2°C), which allows this species to flower in winter or early spring when most other ornamental plants are still dormant [1, 2]. Therefore, it has been widely cultivated as an early-blooming garden ornamental plant. Mei can also be converted into many useful products, including salted mei, mei wine, and juice, which are considered to have important nutritional and medicinal value . All of the above mentioned three products are extensively consumed in East Asian countries . There is an urgent need to cultivate new mei varieties with enhanced ornamental and nutritional value, suitable for consumer needs. However, traditional mei breeding is relatively cumbersome, tedious, and time-consuming. This is mainly because mei is a woody perennial that takes a long time to reach its reproductive age. Recently, DNA markers have been used to analyze genetic diversity, distinguish varieties, and construct genetic maps [3–6]. However, quantitative trait locus (QTL) analysis, genome-wide association studies (GWAS), and genomic selection studies are impeded due to the limited availability of sufficient DNA markers.
With the advent of NGS technologies, entire genomes have been sequenced more efficiently and economically than ever before. The alignment of the short reads obtained from different varieties of mei, to the reference genome, has provided the perfect opportunity to identify a large number of polymorphic DNA markers in parallel, including SNPs, InDels, and SSRs, which are well known in crop species such as rice , eggplant , watermelon , and Chinese cabbage . However, the heterozygous complexity of the genome of ornamental plants and the cost of whole genome deep-coverage sequencing are limiting factors in the genome-wide identification of DNA polymorphisms using massively parallel sequencing technology. Recently, the availability of the mei shotgun genome assembly , which was completed using the Solexa platform, facilitated the discovery of massive numbers of polymorphic DNA markers and the identification of genome-wide variants.
SNPs, InDels, and SSRs are important DNA markers due to their abundance, stability, codominance, efficiency, and ready automation. They have been widely useful for analysing genetic diversity, constructing high-density genetic maps, performing GWAS, and designing genomic selection strategies in many organisms [9, 11–14]. For example, high-resolution genetic map have been constructed to anchor the assembly sequences of watermelon using SSRs, InDels, and SVs, all found using whole-genome resequencing . An initial map of human InDel variation was constructed using DNA resequencing traces to identify polymorphisms that can influence human diseases . One study on GWAS in maize indicated that SNPs can be associated with a phenotype ascribed to linkage disequilibrium (LD) . Recently, a genetic map containing 1,484 SNP markers was constructed using RAD strategy in a segregating F1 population derived from Prunus mume ‘Fenban’ and Prunus mume ‘Kouzi Yudie’ which anchored 83.9% assembly sequences of mei genome . However, the remaining 16.1% assembly sequences of mei genome have not been anchored. These SNPs were distributed unevenly across each chromosome, suggesting that some regions had fewer SNPs than others .
In the present study, we obtained a large number of putative polymorphic markers including SNPs, InDels and SSRs between ‘Fenban’ and ‘Kouzi Yudie’ by using low-depth genome sequencing of the two mei cultivars. We also identified the frequency and distribution of these markers in different regions of eight mei pseudo-chromosomes. In addition to the validation of the SNPs using Agilent SureSelect liquid-based hybrid capture system, InDels and SSRs were also partially validated by actual use as DNA markers. The information described here can be used to construct fully integrated maps of natural genetic variation that include SNPs, InDels, and SSRs. The maps can be used to identify polymorphisms that directly influence mei phenotypes. This information permits novel observations that can be used in mei genetics and breeding projects.
Distribution of polymorphic DNA markers present in both ‘Fenban’ and ‘Kouzi Yudie’ on eight mei pseudo-chromosomes
No. of SNPs
No. of InDels
No. of SSRs
Physical size (Mb)
The number of polymorphic DNA markers varied across each pseudo-chromosome. The highest number of SNPs (40,350) and SSRs (1,376) was observed in pseudo-chromosome 2. This was 3.6-fold higher than the number of SNPs (11,360) found in pseudo-chromosome 8 and 2.7-fold higher than the number of SSRs (502) in pseudo-chromosome 8, which had the fewest SNPs and SSRs. The highest number of InDels (895) was observed in pseudo-chromosome 2. This was 2.6-fold more than the number of InDels (344) detected in pseudo-chromosome 7, which had the fewest (Table 1). The marker distribution of individual pseudo-chromosomes was uneven, as in rice . This result can be attributed to the variations in chromosome size in the mei genome. Pseudo-chromosome 2 was found to be 42.1 Mb in size, which was 2.5-fold the size of pseudo-chromosome 7 (17.1 Mb) and was 2.4-fold that of pseudo-chromosome 8 (17.3 Mb) (Table 1).
A total of 200,627 SNPs, 4,900 InDels, and 7,063 SSRs were annotated using the Mei Annotation Project Database release (http://prunusmumegenome.bjfu.edu.cn). The polymorphic markers showed only minimal distribution in CDS regions (Additional files 1, 2, 3). Only 38,773 SNPs (19.3% of the total), 174 InDels (3.6% of the total), and 418 SSRs (5.9% of the total) were distributed in the 22.4 Mb CDS region (Additional files 1, 2, 3). There were more SNPs than InDels or SSRs in CDS regions. This difference can be explained by the fact that InDels and SSRs are more deleterious than SNPs in CDS regions, as indicated by InDels and SSRs that cause frame shift mutations and amino acid substitutions that have major changes to gene function [19, 20]. However, SNPs often produce synonymous mutations that have little or no impact on gene function . In our study, among the 38,773 SNPs, 28,020 SNPs were synonymous and 10,753 SNPs were nonsynonymous. The ratio of nonsynonymous to synonymous substitutions was 0.38, which is lower than that of Arabidopsis (0.83) , rice (1.29) , and soybean (1.61) . It is possible that this difference have been caused by strong purifying selection at nonsynonymous sites of SNPs in CDS regions of mei. However, a more convincing explanation is essential with increasing recognition of mei as a study material for woody plants.
List of the cultivars utilized in the dendrogram
Captured DNA was sequenced on an Illumina GA II instrument, generating 4.2 G sequencing data with 78 bp reads from the 24 libraries that had been prepared with the SureSelect method (NCBI database under accession SRA063161), and 3.4 G reads passed through the Illumina chastity filter to produce automatic allele calling for each locus. Each library was sequenced to a specific depth, providing a mean ~20-fold mapped coverage of the targeted region. Of 670 SNPs, 89.4% (599 in total) produced non-ambiguous data containing 513 SNPs distributed across eight mei pseudo-chromosomes and 86 SNPs located in assembly sequences that were not anchored to mei pseudo-chromosomes (Figure 2 and Additional file 6). About 85.6% (513 in total) of the 599 SNPs were distributed across the mei pseudo-chromosomes with an average of 64 SNPs per pseudo-chromosome, ranging from a maximum of 117 on pseudo-chromosome 2 to a minimum of 38 on pseudo-chromosome 8 (Figure 2 and Additional file 6).
Polymorphic levels of the 599 SNP loci were estimated using 23 mei cultivars and 1 plum cultivar (Additional file 6). Polymorphism information content (PIC) values ranged between 0.26 and 0.50 (mean 0.45), with 541 of the markers producing PIC values > 0.4, a level which was suitable for biodiversity analyses. Generally, diversity values [expected heterozygosity (He)] for SNPs are low . This is ascribed to their bi-allelic nature. In mei, the observed heterozygosity (Ho) and He per locus varied from 0.09 to 0.77 (mean 0.47) and from 0.26 to 0.51 (mean 0.46), respectively (Additional file 6). The mean diversity value (0.46) was higher than the mean values reported for grape (0.30) . However, mei SNPs showed lower diversity values than SSR (0.68) markers . This is a potential drawback of SNPs, but it can be overcome by using a large numbers of markers.
So far, a massive number of InDels have been generated using the NGS platform. These markers ascribed to their high polymorphisms and distribution throughout the genome have been applied to high-resolution genetic mapping, association studies, and map-based cloning [10, 12, 35]. However, the usefulness of InDels has not been explored in mei genetic and genomic research.
To verify that these InDels were suitable for use as new DNA markers, they were used to successfully design PCR primers (Additional file 2). Twenty pairs of the InDel primers labeled with fluorescent dyes were selected for a survey of polymorphisms among P. mume ‘Fenban’ and P. mume ‘Kouzi Yudie,’ and five randomly chosen segregating progeny from a cross between the two cultivars (Additional file 2). The PCR analysis indicated that three of the 20 primer pairs produced no products and that there were no polymorphisms among the mapping parents for the two of the 20 primer pairs. Fifteen primers, which gave reliable and stable amplifications and showed large numbers of polymorphisms, were found suitable for use in the construction of a genetic linkage map in the mapping population. However, a detailed analysis of these polymorphic InDels revealed that three showed longer insertions or deletions than expected (Additional file 7). Krawitz et al. demonstrated that a short sequence read including an InDel might be aligned with mismatched bases instead of gaps . They accomplished this using a BWA short-read mapping tool, which generated a high rate of variant bases at InDel positions . Thus, the mismatched InDels observed in our study may be attributed to alignment with mismatched bases instead of gaps. As a result, the predicted InDel lengths were shorter than those observed by successful PCR amplifications of fragments containing InDels. The high ratio of successful InDel amplifications showed that the detected InDel markers may be suitable for use in the construction of genetic linkage maps.
Distribution of 7,063 putative polymorphic SSRs identified between ‘Fenban’ and ‘Kouzi Yudie’
Average motif length
Number of repeats
SSR loci have been categorized into two classes based on the lengths of SSR repeat motifs: hypervariable class I SSRs (≥ 20 bp) and potentially variable class II SSRs (≥ 12 bp and < 20 bp) . Among the polymorphic SSRs in the two cultivars, class II SSRs (5,016) were significantly more common than the class I SSRs (2,047) (Table 3). Similar patterns have been observed in rice  and papaya . These results can be attributed to the fact that class II SSRs are composed of short repeats, which are more tolerant to mutations than class I SSRs . However, class I SSRs are more polymorphic than class II SSRs, as demonstrated by the experimental data reported for rice , Brachypodium , and papaya . Class II SSRs tend to be less variable because of their smaller chance of slipped-strand mispairing over the expansion of shorter SSR motifs than longer motifs . On the basis of SSR motif length, the dinucleotide repeats (1,346) were the most common motifs in class I SSRs, as indicated by the reports from the five plant species analyzed by Mun et al. . Mononucleotides were the most abundant in class II SSRs, which may be explained by the fact that polymerase slippage rates are higher in dinucleotides than in other repeat motifs. These results are in accordance with the data from human  and fruit fly SSRs .
Polymorphic SSRs with different repeat motifs were also found in the two cultivars. The most common di- and trinucleotide motifs were AG/CT (55.8%) and AAT/ATT (35.5%); however, CG/CG was not observed in either cultivar and CCG/CGG (0.6%) was rare (Additional file 8). AT-rich polymorphic repeat motifs of SSRs were more common than GC-rich repeat motifs in the mapping parents, as indicated in previous reports from eggplant  and papaya . According to previous studies, the (CTG)n, (CCG)n, (AT)n, and (GC)n, all of which have hairpin structures and self-complementary repeat motifs, accumulate readily in the mei genome [46, 47]. However, methylated cytosine can mutate to thymine easily, which may explain the scarcity of GC-rich repeats .
All of these polymorphic SSRs were used to design PCR primers (Additional file 3). In order to assess the SSR polymorphisms among the parental lines and five segregating progeny, twenty pairs of SSR primers were designed and labeled with fluorescent dyes. Eighteen pairs of 20 primers were used for the successful amplification, of which fifteen pairs were suitable for constructing the genetic map between the two cultivars (Additional file 9). A few SSR primers could not be used for successful amplification as indicated by null alleles, which may have been generated by some mutations involving substitutions within primer binding sites and SSR deletions . However, the bulk of the primers could amplify the SSRs successfully, demonstrating the large number of polymorphisms. These observations provide insight into the use of SSRs for the construction of high-resolution genetic maps of mei cultivars in the near future.
In this study, we observed a large number of putative polymorphic SNPs, InDels, and SSRs between ‘Fenban’ and ‘Kouzi Yudie’ using low-depth whole genome sequencing, which present a new methodology and extensive data. These putative polymorphic markers could facilitate the construction of high-density genetic linkage maps, and accelerate QTL analyses, GWAS, genomic selection, and MAS breeding programs in mei.
Twenty-three mei cultivars from the mei germplasm bank in the China Mei Flower Research Center (Wuhan city, China) and one plum cultivar from the Beijing Botanical Garden (Beijing city, China) were collected to perform sequence capture using Agilent’s SureSelect solution phase hybridization assay (Table 2). All DNA samples were extracted from young leaves using the plant genomic DNA extraction Kit (TIANGEN, Beijing, China) following the manufacturer’s protocol.
The genome sequences for P. mume ‘Fenban’ and P. mume ‘Kouzi Yudie’ were downloaded from NCBI database under accession SRA057102. All sequences were aligned to the mei reference genome (http://prunusmumegenome.bjfu.edu.cn./) using BWA software (ver. 0.5.1)  with the cutoff maximum of three mismatches in 90 bp and 2 mismatches in 45 bp. We excluded reads that could be mapped to different genomic positions so as to detect high-quality DNA polymorphic markers.
Uniquely mapped pair-end results were used to perform SNP calling using SOAPsnp . Subsequently, the SNPs with overall sequencing depths of more than 8, quality scores over 30, and at least 4 uniquely mapped reads per allele were extracted.
To detect InDels in uniquely mapped sequences, another mapping process was performed, allowing a gap using BWA software (ver. 0.5.1) . InDels (1–6 bp) were then called using SOAPindel as described in a previous study . Each InDel locus contained an InDel motif and two unique flanking sequences of less than 195 bp on each side of that motif. The InDels were classified as putative polymorphisms if the lengths of the InDel motifs from the two cultivars varied by least 1 bp.
Uniquely mapped reads were used to detect SSRs using the computer program MISA (MIcroSAtellites identification tool, http://pgrc.ipk-gatersleben.de/misa). Minimum repeat lengths for SSR findings were set as 12 bp for mono- to trinucleotides, 16 bp for tetranucleotides, 20 bp for pentanucleotides, 24 bp for hexanucleotides. An SSR locus contained a repeat motif and two unique flanking sequences of 180 bp on each side of the repeat motif. On the basis of these sizes, the SSRs were classified as polymorphisms if the lengths of repeat motifs from the two cultivars varied at least by 2 bp.
The positions of SNPs, InDels and SSRs were identified as CDS, intron, 5′UTR, 3′UTR and intergenic regions according to mei genome GFF files, and each CDS containing these markers were assigned to one or more function annotations using mei annotation project files. These files were downloaded from the Mei Genome Database (http://prunusmumegenome.bjfu.edu.cn). The annotated sequences were then mapped to high level categories using these mei annotation project files according to the three main GO categories (biological process, molecular function, and cellular component). SNPs in the CDS regions were divided into synonymous and non-synonymous amino acid substitutions.
Using the SureSelect method from Agilent , a total of 670 biotinylated RNA probes, each 120 nucleotides in length (Additional file 5), were designed to capture the desired DNA fragments from a pool of 24 genotype DNA fragments. The proportions of the targeted intron, CDS, UTR, and intergenic sequences were 17.5%, 25.5%, 4.8%, and 52.2%, respectively. Capture assay was hybridized with 24 genotypes from genomic libraries labeled with different barcodes. Captured DNA was then sequenced on the Illumina GAII instrument, generating 4.2 G 78 bp reads.
At least 3 μg of genomic DNA of each of the 24 accessions was placed in 80 μl TE-buffer and fragmented using the Covaris instrument. This was followed by end repair, A-tailing, and BGI PE index adapter ligation, as described in the Illumina DNA library preparation protocol .
Adapter ligated DNA was run on a 2% TAE agarose gel, and the region of the gel with fragments in the range of 200–250 bp was excised. The DNA was purified using a gel extraction kit (Qiagen) and eluted in 90 μl EB. The adapter ligated and size-selected DNA was amplified in 50 μl PCR. The PCR reaction contained 3 μl of DNA, 18 ml H2O, 2 μl primer 1.1 (Illumina), 2 μl primer 2.1 (Illumina), and 25 μl Phusion master mix (Finnzymes). PCR amplification conditions were as follows: 2 min at 95°C; 4 cycles of 15 s at 95 °C, 30 s at 60°C, and 30 s at 72°C; then 5 min at 72°C. The reaction product was purified using a QIAquick PCR purification kit (Qiagen) and eluted into 20 μl EB.
SureSelect solution phase hybridization was conducted according to the manufacturer’s (Agilent) standard protocol. The buffers #1, #2, #3, and #4 from the SureSelect kit were mixed to prepare the hybridization solution, which was incubated at 65°C. In parallel, the 300 ng of each DNA library were pooled with the blocker #1, #2, and #3 reagents (Agilent), denatured for 5 min at 95°C, and then incubated at 65°C in a thermal cycler (MJ Research). We then mixed 12 μl of hybridization solution, 5 μl of mixed SureSelect Oligo Capture Library, 11 μl of the DNA library, 1 μl H2O, and 1 μl RNase block (Agilent), incubated for 24 hours at 65°C in a thermal cycler (MJ Research) and captured with the Streptavidin M-280 Dynabeads (Invitrogen). The reaction product was then purified with the MinElute PCR purification kit (Qiagen) according to the manufacturer’s protocol. The purified DNA was enriched by 50 μl PCR reactions containing 15 μl of elution production, 8 μl H2O, 1 μl primer 1.1 (Illumina), 1 μl primer 2.1 (Illumina), and 25 μl Phusion master mix (Finnzymes). The PCR conditions were performed as described above. The PCR products were pooled and purified with Ampure beads (Beckman) and eluted using 50 μl EB. The quality of the capture sample was assessed using a Qubit® dsDNA HS Assay Kit (Invitrogen) prior to its sequencing on Illumina GAII instrument as PE 78 bp reads.
Agilent SureSelect liquid-based hybrid capture arrays were used for SNPs genotyping. The allele calling for each locus was identified using SOAPsnp . Sites meeting the following criteria were identified: overall sequencing depth of over 15; quality score over 30; at least 4 uniquely mapped reads per allele. These sites were referred to as high-confidence calls in our study. For each SNP locus, the number of alleles (Na), Ho, and He was calculated using GenePop version 4.0 . The PIC was calculated using the following formula: PIC = 1-∑P i 2 , where P i is the ith SNP allele frequency . Each SNP locus was scored for the presence (1) or absence (0) of genotype. The data set was used to compile a binary matrix describing 24 cultivar genotypes based on 599 polymorphic co-dominant SNP markers. The genetic similarity coefficient among the genotypes was estimated using NTSYS-pc software (version 2.10) . A dendrogram was generated for the analysis of genetic diversity among mei and plum genotypes based on Neighbor-joining (NJ) method.
The putative polymorphic SSR and InDel loci were scanned using Primer 3 (v. 1.1.4) to design oligonucleotide primers flanking the repeats . The optimized input parameters were as follows: product size: 100–300 bp; primer size: 18–25 bp; primer Tm: 50-60°C; primer GC content: 40-60%.
Of these putative polymorphic SSRs and InDels, we randomly chose 20 primer pairs labeled with fluorescent dyes and amplified among the parental lines and five segregating progeny, respectively. The total genomic DNA from their fresh young leaves was extracted as described above. The SSR and InDel genotypes were performed using a primer strategy, including a forward primer labeled with FAM (Beijing Microread Genetics Co., Ltd, Beijing, China), and a regular reverse primer. The PCR reactions of SSRs and InDels were respectively conducted in a 10 μl mixture. The same mixtures included 50 ng of the genomic DNA, 1 μl of 10 × buffer [20 mM Tris–HCl (pH 8.4), 20 mM KCl, 10 mM (NH4)2SO4, and 1.5 mM MgCl2], 1.2 μl of 2.5 mM dNTP, and 0.6 U of Taq DNA polymerase (Promega, Madison, WI, USA). The different mixtures were as follows: 0.9 μl of 10 uM each of forward and reverse primers for SSRs, and 1 μl of these for InDels and added ddH2O to the total volume. The PCR amplifications of SSRs and InDels were performed with the following program: 5 min at 95°C; followed by 25 cycles of 40 s at 95°C, 30 s at the optimized annealing temperature for each primers (Additional files 2 and 3), 40 s at 72°C, and then a final step for 5 min at 72°C. The PCR products of SSRs and InDels were resolved on an ABI 3730 fluorescent analyzer (Applied Biosystems, Foster City, CA, USA) with the ROX 400 HD as size standard. Data were then analyzed using GeneMapper version 3.7 software (Applied Biosystems, Foster City, CA, USA).
Single nucleotide polymorphisms
Simple sequence repeats
Quantitative trait locus
Genome-wide association study
Burrows-Wheeler alignment tool
Polymorphism information content
Numbers of alleles
Random amplified polymorphic DNA
Amplified fragment length polymorphism
Internal transcribed spacer.
The authors gratefully acknowledge Rongling Wu (Center for Computational Biology, Beijing Forestry University, Beijing) for advice on data interpretation and discussion, Guangyi Fan (BGI, Shenzhen)and Liang Zeng (Shanghai Institutes of Biological Sciences, Chinese Academy of Sciences, China, Shanghai) for technical support in bioinformatics. The research was supported by Ministry of Science and Technology (2011AA100207, 2013AA102607) and the State Forestry Administration of China (201004012).
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.