A chloroplast genomic strategy for designing taxon specific DNA mini-barcodes: a case study on ginsengs
- Wenpan Dong†1,
- Han Liu†1,
- Chao Xu1,
- Yunjuan Zuo3,
- Zhongjian Chen2Email author and
- Shiliang Zhou1Email author
© Dong et al.; licensee BioMed Central. 2014
Received: 11 October 2014
Accepted: 26 November 2014
Published: 20 December 2014
Universal conventional DNA barcodes will become more and more popular in biological material identifications. However, in many cases such as processed medicines or canned food, the universal conventional barcodes are unnecessary and/or inapplicable due to DNA degradation. DNA mini-barcode is a solution for such specific purposes. Here we exemplify how to develop the best mini-barcodes for specific taxa using the ginseng genus (Panax) as an example.
The chloroplast genome of P. notoginseng was sequenced. The genome was compared with that of P. ginseng. Regions of the highest variability were sought out. The shortest lengths which had the same discrimination powers of conventional lengths were considered the best mini-barcodes. The results showed that the chloroplast genome of P. notoginseng is 156,387 bp. There are only 464 (0.30%) substitutions between the two genomes. The intron of rps16 and two regions of the coding gene ycf1, ycf1a and ycf1b, evolved the quickest and served as candidate regions. The mini-barcodes of Panax turned out to be 60 bp for ycf1a at a discrimination power of 91.67%, 100 bp for ycf1b at 100%, and 280 bp for rps16 at 83.33%.
The strategy by searching the whole chloroplast genomes, identifying the most variable regions, shortening the focal regions for mini-barcodes are believed to be efficient in developing taxon-specific DNA mini-barcodes. The best DNA mini-barcodes are guaranteed to be found following this strategy.
DNA barcoding is a relatively new concept, aiming to provide rapid, accurate and automatable species identification using a standard DNA region. Chloroplast (or plastid) sequences such as rbcL and matK are usually used as DNA barcodes for plant . The lengths of the commonly used barcoding markers are longer than 650 bp. In most cases it is easy to achieve PCR success when using DNA of high quality. However, if the DNA molecules have degraded into fragments shorter than the spanning length of the primers, say 650 bp, it would not be possible to amplify the DNA barcodes. In these cases, DNA mini-barcodes could be used.
A DNA mini-barcode is a short DNA, generally 100–250 bp , suitable for species identification. Thus far, a few tries have been made to design DNA mini-barcodes ,. Owing to significantly reduced length of sequences, PCR amplification success should presumably be much improved, but identification success would thus be hampered. A good DNA mini-barcode should be of high PCR and sequencing successes without much lowering species discrimination power. Therefore, DNA mini-barcodes are more often taxon specific than universal. Preferably DNA mini-barcodes should be the most informative regions of a genome. For seed plants, it is now realistic to find such DNA mini-barcodes by searching the whole chloroplast genomes owing to the ease of chloroplast genome sequencing .
Chloroplast sequences have been extensively used for species identification and phylogenetic reconstruction of plants. Chloroplast sequences evolve relatively slowly and there are not very many substitutions between species within a genus. To find the best DNA mini-barcodes, whole chloroplast genome screening is usually necessary. Typically, the chloroplast genome size of higher plants ranges from 120 to 160 kb, and a pair of inverted repeats (IRs) divides the genome into a large single copy (LSC) region and a small single copy (SSC) region. The IR regions are quite conservative , and the variable regions locate predominantly in the LSC and SSC .
DNA mini-barcodes can be used for species identification of digested material , old herbarium/museum specimens , ancient DNA, and more frequently processed medicinal herbs when high-quality DNA is not available and degraded DNA has to be used. Ginsengs (Panax spp., Araliaceae) are the best known Chinese medicine worldwide. They have been used as medicines alone or in combinations with other medicines. Recently, ginsengs were also used as an ingredient of cosmetics, tooth paste, beverage, vegetable, etc. There are eight species in Panax. All species are considered seriously endangered medicinal plants. Panax notoginseng (Burkill) F. H. Chen ex C. Y. Wu & K. M. Feng is extinct in the wild and wild P. ginseng C.A. Mey. in China is nearly extinct. However, illegal harvest and trade happen occasionally. For law-enforcement activities in conservation of wild populations of endangered species, there is a need for a method for correct identification of confiscated materials in forms of fragments, powders or decoctions of any organs. Panax ginseng and P. notoginseng have been cultivated in China for a long time for medicinal purposes. Roots of P. quinquefolius L. are imported from the USA or produced in the Northeast China. The commercial roots of P. ginseng and P. quinquefolius resemble each other and it is difficult for laymen to tell them apart. When they were sliced or powdered, it is unlikely for experts to distinguish them. Almost all species are identifiable using the DNA barcoding method according to Zuo et al. . However, if the materials were processed as in decoctions and dietary supplements, the conventional DNA barcodes would probably fail. Therefore, it is justified to design DNA mini-barcodes of ginsengs for conservation purpose and for monitoring ginseng market and protecting consumers’ rights.
In this study, we report a strategy of designing taxon-specific DNA mini-barcodes using ginsengs as an example. We first sequenced the chloroplast genome of P. notoginseng, then we sought out the hypervariable regions by comparing the new genome to the one of P. ginseng, and finally we determined the length and positions of the best DNA mini-barcodes and tested their applicability.
Characteristics of the chloroplast genome of P. notoginseng
In total, 156 indels were detected in the chloroplast genomes of two Panax species (Additional file 2: Table S2), 84 insertions and 72 deletions in P. notoginseng or 84 deletions and 72 insertions in P. ginseng. Most of the indels (63.06%) were single nucleotide differences. Indels longer than 10 bp occurred 16 times. The longest indel (34 bp) was in the spacer between rps16 and trnQ. The majority of the indels occurred in the non-coding regions with two exceptions, a 15 bp insertion and an 18 bp insertion in the ycf2 gene of P. notoginseng.
Three short inversions were observed in ndhD-psaC, petB intron and trnM-atpE between the two chloroplast genomes (Additional file 3: Table S3). All the inversions have hairpin structures, including the inversions and the inverted repeats. The inverted repeats formed the stem structures, and the inversions formed the loops. The lengths of inverted repeats were 3 bp, 44 bp, and 11 bp, and the lengths of the inversions were 19 bp, 18 bp, and 14 bp, respectively in the ndhD-psaC, petB intron and trnM-atpE regions.
Variability throughout the chloroplast genomes
Variability of the three new markers and universal chloroplast DNA barcode in Panax
Parsimony informative sites
Nucleotide diversity (π)
Number of haplotypes
The shortest length for a candidate barcode to reach the maximum discrimination success using genetic distance method
Maximum discrimination success (%)
Shortest length (bp)
Primer sequence 5' to 3'
DNA mini-barcode for Panax
Practically the length of a barcode becomes an issue of concern. Very subjectively we can classify a barcode according to the length, for example, micro-barcode within 100 bp, mini-barcode of 100–250 bp ,, conventional barcode of 250–1000 bp, super-barcode of 1000-6000 bp, and genome barcode using the whole genome. However, most applications of mini-barcodes are not necessarily to be the kind because a conventional barcode could be created by concatenating several mini-barcodes. Mini-barcodes were often used at the risk of lowering the resolution of taxa and consequently underestimated biodiversity. Mini-barcode has its potentials in situations that long fragments are impracticable or unnecessary for efficiency and economy considerations. Mini-barcodes of high resolutions are not easily found and that is why whole genomes are indispensible for development of mini-barcodes.
Chloroplast genome is endemic to plants. Chloroplast DNA barcodes bypass the DNA contamination from other organisms without chloroplasts, such as animals and fungi. Therefore, chloroplast DNA barcodes are of primary choices. Unfortunately, chloroplast genes usually evolve more slowly than nuclear genes  and the candidate barcodes such as matK and rbcL often have limited resolutions at species level ,. However, there are some regions in the chloroplast genome which evolve much quickly and meet the criteria of being a DNA barcode. The strategy of searching the whole chloroplast genomes had been successfully applied to Jacobaea , Oncidium , Parthenium , and Theobroma . Although some species are extremely closely related and no variations at the loci of matK and rbcL, for example, Acorus americanus v. s. A. calemus and Oryza nivara v. s. O. sativa, there are some differences at other loci . Therefore, it is a reliable strategy to find the best chloroplast DNA mini-barcodes by searching the chloroplast genomes of congeners. Another advantage of chloroplast mini-barcode is that there is almost free of intra-populational variations and very low inter-populational variations. Sequence divergence is predominantly between species . Species identification is for most cases more reliable.
Indels (gaps) are another kind of informative signals of potentially useful . There are 157 indels along the two genomes of Panax. Indels are more useful at the lowest taxonomic level. Microsatellite markers are analogous to indels. It is often cumbersome in practice by using indel information. When gaps are coded as the fifth state of characters, they are very likely to be overweighted. To solve this problem, gaps are better coded manually. There is unlikely to have indels in the mini-barcodes of closely related species. However, chloroplast indels are likely to be another kind of DNA barcode for closely related species.
DNA mini-barcodes have so far been used for studying flora or fauna ,,,,,. Such usages are often a compromise between resolution and experimental success. Consequently the mini-barcodes may underestimate the diversity of flora or fauna. However, DNA mini-barcodes are more suitable for ecologically and economically important taxa because it is more likely to find the best and taxa-specific mini-barcodes. Ginsengs are the most well-known herbal medicine in China. They have been extensively used for a long time. Substitution of expensive materials with similar but cheaper ones of congeners is reported occasionally. An effective and quick method for identifying the species of ginsengs is helpful for monitoring ginseng markers. We tested our mini-barcodes using materials purchased from market and they are proven applicable for such cases.
In this study we provide a strategy for developing taxon-specific DNA mini-barcode without lowering discrimination power using the ginseng genus (Panax) as an example. The strategy by searching the whole genomes, identifying the most variable regions, shortening the focal regions for mini-barcodes are believed to be efficient in developing taxon-specific DNA mini-barcodes. The mini-barcodes for Panax were tested useful for identifying processed ginsengs from medicinal market.
Chloroplast genome sequencing
Leaves of P. notoginseng were collected from Wenshan, Yunnan province (Collection number: A8). The genomic DNA were extracted using modified CTAB (mCTAB) methods  and purified using the Wizard DNA Clean-Up System (Promega, Madison, WI, USA). The chloroplast genome was sequenced by using the short-range PCR method similar to Dong et al. . Panax-specific primers (Additional file 6: Table S4) based on the chloroplast genome of P. ginseng  and some universal primers  were used to amplify and sequence the chloroplast genome of P. notoginseng. The chloroplast genome of ginseng served as a reference. The genome structure was confirmed by amplifying additional fragments spanning the LSC ↔ IRb, IRb ↔ SSC, SSC ↔ IRa, and IRa ↔ LSC .
The whole chloroplast genome was annotated using DOGMA  to identify the coding sequence, rRNA, and tRNA using the chloroplast/bacterial genetic code. The annotation of the tRNA genes was checked using tRNAscan-SE . The genome map was generated using GenomeVx .
Identification of the hypervariable regions
The chloroplast genome of P. notoginseng was aligned to the chloroplast genome of P. ginseng  using MAFFT  and then adjusted manually using Se-Al 2.0 . To identify the highly variable regions within the chloroplast genomes, we calculated the nucleotide diversity using DnaSP ver. 5.0  with a sliding window analysis. The window length was set to 600 bp with a step size of 25 bp.
Plant material, PCR amplification and hypervariable region sequencing
All 8 Panax species were included in this study and each species was represented by at least two accessions (Additional file 7: Table S5). Medicinal materials (Additional file 4: Figure S1) were purchased from market to test the mini-barcodes designed in this study. The primers for amplifying the highly variable regions were designed using FastPCR (Additional file 8: Table S6). The primers for amplifying and sequencing the control markers of rbcL, matK and trnH-psbA were the same as Zuo et al. . The PCR amplifications were performed in a final volume of 25 μL containing 1× PCR buffer (with Mg2+), 0.25 mmol/L each dNTP, 0.25 μmol/L each primer, 1.25 U Taq polymerase, and 20–30 ng DNA.
The PCR program started at 94°C for 4 min, followed by 34 cycles of 30 s at 94°C, 40 s at 52°C, and 1 min at 72°C, and ended with a final extension of 10 min at 72°C. The PCR products were checked by electrophoresis on a 1% agarose gel containing ethidium bromide and visualized using an ultraviolet transilluminator. Both of the strands were sequenced on ABI Prism 3730xl (Applied Biosystems, Foster City, U.S.A.) following the manufacturer’s protocols.
DNA barcoding analysis
Distance is likely the most commonly used method for classifying DNA sequences. In this study, the distance method was used to analyze the barcoding performances of the newly identified highly variable regions. The function nearNeighbour of SPIDER was used for barcoding analysis . Species discrimination was considered successful if the closest K2P distance for all of the individuals of a given species belonged to only one conspecific individual.
DNA mini-barcode search using SPIDER
We used the sliding window function slideAnalyses of SPIDER  version 1.2-0 to find out the shortest informative windows. This function extracts all the passable windows of a chosen size in a DNA alignment and performs pairwise distance- (K2P) and NJ tree-based analyses of each window. In order to know the performances of markers with the increases of their sequence lengths, the changes of discrimination power, the maximum percentage of samples discriminated (Pm), at 50, 100, 150, 200, 250, 300, and 350 bp were depicted. In order to know the minimum length of a mini-barcode that performed as well as the full length, sliding window analyses were conducted. The starting length was set to 50 bp. The length was increased by 10 bp each round in the subsequent searches till the length of maximum discrimination power. The shortest length of a marker was considered the shortest mini-barcode of the marker.
Availability of supporting data
The chloroplast genome of P. notoginseng has been submitted to GenBank (accession KJ566590). The data set supporting the results of this article is included in Additional file 7: Table S5 and available in the GenBank with accession number KM210094 – KM210203.
We are grateful to the two anonymous reviewers for their comments that help significantly to improve the quality of this manuscript. This study was supported by grants from the Ministry of Science and Technology of China (2011FY120200, 2012BAC01B05 and 2012AA021602) and NSFC (30370154) to Shiliang Zhou.
- Dong W, Cheng T, Li C, Xu C, Long P, Chen C, Zhou S: Discriminating plants using the DNA barcode rbcLb: an appraisal based on a large dataset. Mol Ecol Resour. 2014, 14 (2): 336-343. 10.1111/1755-0998.12185.PubMedView ArticleGoogle Scholar
- Meusnier I, Singer GA, Landry JF, Hickey DA, Hebert PD, Hajibabaei M: A universal DNA mini-barcode for biodiversity analysis. BMC Genomics. 2008, 9: 214-10.1186/1471-2164-9-214.PubMedPubMed CentralView ArticleGoogle Scholar
- Little DP: A DNA mini-barcode for land plants. Mol Ecol Resour. 2014, 14 (3): 437-446. 10.1111/1755-0998.12194.PubMedView ArticleGoogle Scholar
- Francoso E, Arias MC: Cytochrome c oxidase I primers for corbiculate bees: DNA barcode and mini-barcode. Mol Ecol Resour. 2013, 13 (5): 844-850. 10.1111/1755-0998.12135.PubMedView ArticleGoogle Scholar
- Dong W, Xu C, Cheng T, Lin K, Zhou S: Sequencing angiosperm plastid genomes made easy: a complete set of universal primers and a case study on the phylogeny of Saxifragales. Genome Biol Evol. 2013, 5 (5): 989-997. 10.1093/gbe/evt063.PubMedPubMed CentralView ArticleGoogle Scholar
- Dong W, Xu C, Cheng T, Zhou S: Complete chloroplast genome of Sedum sarmentosum and chloroplast genome evolution in Saxifragales . PLoS ONE. 2013, 8 (10): e77965-10.1371/journal.pone.0077965.PubMedPubMed CentralView ArticleGoogle Scholar
- Dong W, Liu J, Yu J, Wang L, Zhou S: Highly variable chloroplast markers for evaluating plant phylogeny at low taxonomic levels and for DNA barcoding. PLoS ONE. 2012, 7 (4): e35071-10.1371/journal.pone.0035071.PubMedPubMed CentralView ArticleGoogle Scholar
- Little DP, Jeanson ML: DNA barcode authentication of saw palmetto herbal dietary supplements. Sci Rep. 2013, 3: 3518-10.1038/srep03518.PubMedPubMed CentralView ArticleGoogle Scholar
- Shokralla S, Zhou X, Janzen DH, Hallwachs W, Landry JF, Jacobus LM, Hajibabaei M: Pyrosequencing for mini-barcoding of fresh and old museum specimens. PLoS ONE. 2011, 6 (7): e21252-10.1371/journal.pone.0021252.PubMedPubMed CentralView ArticleGoogle Scholar
- Zuo YJ, Chen ZJ, Kondo K, Funamoto T, Wen J, Zhou SL: DNA barcoding of Panax species. Planta Med. 2011, 77 (2): 182-187. 10.1055/s-0030-1250166.PubMedView ArticleGoogle Scholar
- Hajibabaei M, Smith MA, Janzen DH, Rodriguez JJ, Whitfield JB, Hebert PDN: A minimalist barcode can identify a specimen whose DNA is degraded. Mol Ecol Notes. 2006, 6 (4): 959-964. 10.1111/j.1471-8286.2006.01470.x.View ArticleGoogle Scholar
- Drouin G, Daoud H, Xia J: Relative rates of synonymous substitutions in the mitochondrial, chloroplast and nuclear genomes of seed plants. Mol Phylogenet Evol. 2008, 49 (3): 827-831. 10.1016/j.ympev.2008.09.009.PubMedView ArticleGoogle Scholar
- Zhang CY, Wang FY, Yan HF, Hao G, Hu CM, Ge XJ: Testing DNA barcoding in closely related groups of Lysimachia L. (Myrsinaceae). Mol Ecol Resour. 2012, 12 (1): 98-108. 10.1111/j.1755-0998.2011.03076.x.PubMedView ArticleGoogle Scholar
- Clement WL, Donoghue MJ: Barcoding success as a function of phylogenetic relatedness in Viburnum, a clade of woody angiosperms. BMC Evol Biol. 2012, 12 (1): 73-10.1186/1471-2148-12-73.PubMedPubMed CentralView ArticleGoogle Scholar
- Doorduin L, Gravendeel B, Lammers Y, Ariyurek Y, Chin AWT, Vrieling K: The complete chloroplast genome of 17 individuals of pest species Jacobaea vulgaris: SNPs, microsatellites and barcoding markers for population and phylogenetic studies. DNA Res. 2011, 18 (2): 93-105. 10.1093/dnares/dsr002.PubMedPubMed CentralView ArticleGoogle Scholar
- Wu FH, Chan MT, Liao DC, Hsu CT, Lee YW, Daniell H, Duvall MR, Lin CS: Complete chloroplast genome of Oncidium Gower Ramsey and evaluation of molecular markers for identification and breeding in Oncidiinae. BMC Plant Biol. 2010, 10 (1): 68-10.1186/1471-2229-10-68.PubMedPubMed CentralView ArticleGoogle Scholar
- Kumar S, Hahn F, McMahan C, Cornish K, Whalen M: Comparative analysis of the complete sequence of the plastid genome of Parthenium argentatum and identification of DNA barcodes to differentiate Parthenium species and lines. BMC Plant Biol. 2009, 9 (1): 131-10.1186/1471-2229-9-131.PubMedPubMed CentralView ArticleGoogle Scholar
- Kane N, Sveinsson S, Dempewolf H, Yang JY, Zhang D, Engels JM, Cronk Q: Ultra-barcoding in cacao ( Theobroma spp.; Malvaceae) using whole chloroplast genomes and nuclear ribosomal DNA . Am J Bot. 2012, 99 (2): 320-329. 10.3732/ajb.1100570.PubMedView ArticleGoogle Scholar
- Zhou S, Dong W, Chen X, Zhang X, Wen J, Schneider H: How many species of bracken ( Pteridium ) are there? Assessing the Chinese brackens using molecular evidence . Taxon. 2014, 63 (3): 509-521. 10.12705/633.9.View ArticleGoogle Scholar
- Ochoterena H: Homology in coding and non-coding DNA sequences: a parsimony perspective. Plant Syst Evol. 2009, 282 (3–4): 151-168. 10.1007/s00606-008-0095-y.View ArticleGoogle Scholar
- Bhattacharjee MJ, Ghosh SK: Design of mini-barcode for catfishes for assessment of archival biodiversity. Mol Ecol Resour. 2013, 14 (3): 469-477. 10.1111/1755-0998.12198.PubMedView ArticleGoogle Scholar
- Arif IA, Khan HA, Al Sadoon M, Shobrak M: Limited efficiency of universal mini-barcode primers for DNA amplification from desert reptiles, birds and mammals. Gen Mol Res. 2011, 10 (4): 3559-3564. 10.4238/2011.October.31.3.View ArticleGoogle Scholar
- Li J, Wang S, Jing Y, Wang L, Zhou S: A modified CTAB protocol for plant DNA extraction. Chinese Bulletin of Botany. 2013, 48 (1): 72-78. 10.3724/SP.J.1259.2013.00094.View ArticleGoogle Scholar
- Kim KJ, Lee HL: Complete chloroplast genome sequences from Korean ginseng (Panax schinseng Nees) and comparative analysis of sequence evolution among 17 vascular plants. DNA Res. 2004, 11 (4): 247-261. 10.1093/dnares/11.4.247.PubMedView ArticleGoogle Scholar
- Wyman SK, Jansen RK, Boore JL: Automatic annotation of organellar genomes with DOGMA. Bioinformatics. 2004, 20 (17): 3252-3255. 10.1093/bioinformatics/bth352.PubMedView ArticleGoogle Scholar
- Schattner P, Brooks AN, Lowe TM: The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs. Nucleic Acids Res. 2005, 33: W686-W689. 10.1093/nar/gki366.PubMedPubMed CentralView ArticleGoogle Scholar
- Conant GC, Wolfe KH: GenomeVx: simple web-based creation of editable circular chromosome maps. Bioinformatics. 2008, 24 (6): 861-862. 10.1093/bioinformatics/btm598.PubMedView ArticleGoogle Scholar
- Katoh K, Toh H: Parallelization of the MAFFT multiple sequence alignment program. Bioinformatics. 2010, 26 (15): 1899-1900. 10.1093/bioinformatics/btq224.PubMedPubMed CentralView ArticleGoogle Scholar
- Rambaut A: Se-Al: sequence alignment editor. version 2.0. Oxford: University of Oxford, Department of Zoology; 1996.Google Scholar
- Librado P, Rozas J: DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics. 2009, 25 (11): 1451-1452. 10.1093/bioinformatics/btp187.PubMedView ArticleGoogle Scholar
- Brown SD, Collins RA, Boyer S, Lefort MC, Malumbres-Olarte J, Vink CJ, Cruickshank RH: Spider: an R package for the analysis of species identity and evolution, with particular reference to DNA barcoding. Mol Ecol Resour. 2012, 12 (3): 562-565. 10.1111/j.1755-0998.2011.03108.x.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.