Linking the potato genome to the conserved ortholog set (COS) markers
© Lindqvist-Kreuze et al.; licensee BioMed Central Ltd. 2013
Received: 24 January 2013
Accepted: 5 June 2013
Published: 8 June 2013
Skip to main content
© Lindqvist-Kreuze et al.; licensee BioMed Central Ltd. 2013
Received: 24 January 2013
Accepted: 5 June 2013
Published: 8 June 2013
Conserved ortholog set (COS) markers are an important functional genomics resource that has greatly improved orthology detection in Asterid species. A comprehensive list of these markers is available at Sol Genomics Network (http://solgenomics.net/) and many of these have been placed on the genetic maps of a number of solanaceous species.
We amplified over 300 COS markers from eight potato accessions involving two diploid landraces of Solanum tuberosum Andigenum group (formerly classified as S. goniocalyx, S. phureja), and a dihaploid clone derived from a modern tetraploid cultivar of S. tuberosum and the wild species S. berthaultii, S. chomatophilum, and S. paucissectum. By BLASTn (Basic Local Alignment Search Tool of the NCBI, National Center for Biotechnology Information) algorithm we mapped the DNA sequences of these markers into the potato genome sequence. Additionally, we mapped a subset of these markers genetically in potato and present a comparison between the physical and genetic locations of these markers in potato and in comparison with the genetic location in tomato. We found that most of the COS markers are single-copy in the reference genome of potato and that the genetic location in tomato and physical location in potato sequence are mostly in agreement. However, we did find some COS markers that are present in multiple copies and those that map in unexpected locations. Sequence comparisons between species show that some of these markers may be paralogs.
The sequence-based physical map becomes helpful in identification of markers for traits of interest thereby reducing the number of markers to be tested for applications like marker assisted selection, diversity, and phylogenetic studies.
The use of genetic diversity in plant breeding is a sustainable method to conserve valuable genetic resources and to increase agricultural productivity and food security . To facilitate the use of the wide genetic diversity existing in landraces and crop wild relatives more information is needed on the organization and structure of their genes and genomes. Molecular markers linked to loci with important effects hold a promise to facilitate the introgression of those traits into adapted germplasm. Agriculturally important traits captured during domestication are often coded by very limited number of loci with major phenotypic effects. Within the Solanaceae it is common to find that these loci have putative orthologous counterparts in other species  and therefore molecular markers, such as Conserved Orthologous Set (COS) markers, are powerful in comparing genomic information across species .
The development of markers for orthologous genes, many of which have been mapped in tomato, is documented in the Sol Genomics Network . Comparative mapping studies with the help of COS markers have shown syntenic relationships within various species of the Solanaceae family [5–7] and between species within the Asterid and Rosid clades comparing coffee (Rubiaceae, Asterid) with tomato (Solanaceae, Asterid)  and coffee and grapevine (Vitaceae, Rosid) . The combined power of comparative mapping and systematic analysis of germplasm with orthologous gene markers can efficiently leverage information generated by genomic research from one species to another. COS markers also have shown great power in resolving interrelationships of tomato and potato with great precision .
The recent accumulation of nucleotide sequences of model organisms and crop plants has provided fundamental information for the design of sequence-based research applications in functional genomics . The draft genome sequence of potato has been publicly available since late 2010 and the finalized high-quality sequence has been released  as well as the genome sequence of closely related tomato . The availability of these genomes and the genomic tool kits, such as genome browsers, are of great importance to the scientific community working with solanaceous crops. With the help of physical sequences, new molecular markers can be developed efficiently, utilizing genes in the regions of the genome that contain markers linked to traits of interest. The possibility of comparing physical and genetic maps also has implications for molecular breeding programs, facilitating the search of molecular markers flanking QTL . Linking COS markers to the potato genome sequence allows for powerful comparative genomics between the potato genome and other species with COS-based maps that do not yet have genome sequence available.
Here we present a case study where COS are amplified from diverse set of Solanum germplasm and aligned to the whole genome sequence of potato, allowing for comparison of physical and genetic maps of related species. We aligned the sequences of COS, generated from a panel of ten genotypes of potato and tomato, to the recently published potato genome sequence and compared the physical location with the genetic location in tomato and potato. We show that the COS markers analyzed are single- or low-copy in the DM potato genome (see Methods) and that there are several breaks in co-linearity between the species analyzed.
COS markers with multiple hits in DM superscaffolds and their corresponding DM gene hits
Genetic map chromosome
DM gene annotation
Gene of unknown function
CAK associated cycling H homolog
Peroxisomal targeting signal type 2 receptor
Conserved gene of unknown function
small molecular heat shock protein
F-box family protein
F-box/leucine rich repeat protein
Jasmonic acid amino acid conjugating enzyme
Conserved gene of unknown function
HMG CoA synthase
Glucose 6 phosphate isomerase
chaperonin containing T-complex protein 1, beta subunit
T-complex protein 1 subunit beta
ATP synthase subunit b’ chloroplastic
Pyruvate dehydrogenase E1 alpha subunit
Elongation factor TuA
Elongation factor TuB
For genetic mapping in potato we utilized mostly the back cross progeny BCT . 186 COS markers were placed on the BCT consensus linkage map, which contains in total 321 markers assembled into 12 linkage groups. The total length of the consensus BCT map was 1042 cM, the average marker interval was 3.4 cM and the maximum interval was 34.7 cM on chromosome 12. In addition three COS markers were integrated on the BCT paternal map because they would not integrate on the consensus map.19 markers that were not polymorphic in the BCT parents, were placed on the previously published frame work genetic maps of PCC1  and PD . The genetic maps are shown in Additional file 2: Table S2.
A total of 208 COS were placed on the potato genetic maps (Additional file 1: Table S1). Of these, 173 were also mapped in silico, but there are 35 markers that were only mapped genetically because their DNA sequences were not available. The Tomato EXPEN2000 genetic map, from here on referred to as TomEXPEN, was used as a reference and the map locations of the COS markers in silico mapped in potato in this project were downloaded from the SGN web site . Of the 322 COS mapped in silico 254 were found in the TomEXPEN map.
DM genes corresponding to the single copy COS markers that map in unexpected chromosomes
Markers having unexpected locations were found in all chromosomes, but the highest number of these was in chromosome 10. Pairwise comparisons between the three maps show that eight markers that locate in chromosome 10 in at least one of the maps have an alternative locus in another chromosome (Additional file 1: Table S1). These markers are: C2_At2g46370, (in silico 1 and 5, tomato 10); C2_At3g60080 (in silico 2, tomato 10); T1391 (in silico 2, potato 1 and 10, tomato 10); T0966 (in silico 10, potato 10, tomato 7); C2_At5g08580 (in silico 10,potato 2, tomato 2); C2_At5g06760 (in silico 10, tomato 1); C2_At4g26180 (in silico 12, potato 10, tomato 12); C2_At2g41680 (in silico 4 and 10, in tomato 9). Differences are mostly specific to the genetic maps, meaning that the marker position is usually conserved in two of the maps. Also, multi-copy markers mapping to different chromosomes in silico in DM are mostly found in one of the same chromosomes in the genetic maps. For example, marker C2_At2g42620 in DM maps in chromosomes 12 and 7, whereas in tomato it only maps in chromosome 12. This could be simply because the alternative marker was not detected due to lack of polymorphism or because the other sequence detected by BLASTn search is a paralog.
The COS that mapped in the same chromosomes by both methods (in silico in potato and genetically in potato or in tomato) as found at SGN were not always in agreement in their exact order, reflecting errors either in statistical testing or differences between the solanaceous species at the microsynteny level. In addition to the nine large inversions between tomato and potato several small inversions have been demonstrated . In total, 77 COS that were mapped in potato (either in silico or genetically) were not found on the TomEXPEN map and thus we were not able to compare their locations.
T0408 marker was sequenced from two genotypes, the parents of the PD population (CHS_625 and PS-3). This marker is entirely in the exon region and is similar to the genes PGSC0003DMG400046906 (gene of unknown function) on chromosome 1 and PGSC0003DMG400029022 (aminotransferase) in chromosome 11 (Table 1). In the TomEXPEN map this marker is found in chromosome 1. The coding sequences PGSC0003DMC400069010 and PGSC0003DMC400050560 are identical in the query sequence region consisting of 119 amino acids. However, outside this area the two DM CDS are not identical. Genotype CHS-625 differs from the DM sequences in only one amino acid. Genotype PS-3 is highly heterozygous and because only one sample was sequenced and we cannot resolve the two possible haplotypes of this genotype and therefore it appears different from the rest of the sequences (Figure 2a). The corresponding tomato reference genome coding sequence is quite different from the potato sequences. In this case the gene may be single copy but the marker may be unspecific, resulting in alternative hits.
Marker At1g14980 was amplified from genotypes LA1974, HH1-9 and M200-30 and the sequences are similar to PGSC0003DMG400028744 (PGS0003DMC400050071) in chromosome 7 and PGSC0003DMG402023448 (PGSC0003DMC400040570) in chromosome 5 with the e values of 1.00E-110 and 1.00E-99, respectively. The marker spans both exonic and intronic regions. Translated amino acid sequences of the exonic regions show two well resolved groups where two sequences from M200-30 group together with one of the tomato genomic sequences and two sequences from HH1-9 group with the CDS of the gene that maps in chromosome 5. Relationships with the other DM coding sequence are not well resolved (Figure 2b). Genetic mapping in potato suggests that the marker resides in chromosome 5. However, based on the sequence data we cannot determine the correct location for this marker.
Marker At2g42620 sequences from the BCT population parents (HH1-9 and M200-30) have hits in genes PGSC0003DMG400007856 (F-box family protein) and PGSC0003DMG400035320 (F-box/leucine rich repeat protein) with the e values of 0 and 1.00E-112, respectively. The first gene is found in chromosome 12 and the latter in chromosome 7. According to the NJ tree, all our sequences from the genotypes HH1-9 and M200-30 are more similar to the first mentioned gene represented by the coding sequence PGSC0003DMC400013844 (Figure 2c). The latter DM gene has some amino acid changes comparing with the others and thus may code for a different gene as already shown by the different annotations (Table 1). Genetically this marker is found in chromosome 12 in tomato which most likely is its correct location.
Marker T1511 was amplified from five genotypes (CHS-625, PS-3, PI310991, MP1-8, and HH1-9). According to the BLASTn analysis it is similar to the DM genes PGSC0003DMG400018190 (Elongation factor TuA) in chromosome 3 (1E-160) and PGSC0003DMG400041767 Elongation factor TuB, 6E-63) in chromosome 6. In NJ tree all genotypes are more closely related to the first gene represented by the CDS PGSC0003DMC400031700 (Figure 2d). The marker resides in the exon and has quite variable sequence even at the amino acid level. Because this marker has been genetically mapped in chromosome 3 in tomato and the evalue for the hit in chromosome 3 is higher (Table 1), this is most likely its correct location. Of the three corresponding tomato coding sequences, two group with the chromosome 3 gene.
A comparative summary of the maps is shown in Figure 1. Overall the alignment of COSII markers follows a sequential order. However, as described above several COSII markers show differences as indicated by crossing lines or lines indicating locations on different linkage groups or pseudomolecules.
There is a large overlap of QTL regions between the traits included and based on this information alone the same markers may be considered candidates for disease resistance and Carotenoid or vitamin C biosynthesis (Additional file 1: Table S1 and Additional file 3: Table S3). Therefore, functional annotations of the matching DM genes (Additional file 3: Table S3) may help suggesting markers in candidate genes for the QTL traits and for further studies.
Significantly enriched terms in the biological process category of the gene ontology associated with COSII markers mapped onto the DM genome
Number in COSII-DM list
Number in TAIR9 list
coenzyme biosynthetic process
cofactor biosynthetic process
oxoacid metabolic process
cellular nitrogen compound metabolic process
response to abiotic stimulus
organic acid metabolic process
carboxylic acid metabolic process
cellular ketone metabolic process
cellular amino acid and derivative metabolic process
cofactor metabolic process
cellular amino acid metabolic process
cellular amine metabolic process
amine metabolic process
coenzyme metabolic process
response to salt stress
response to stimulus
response to inorganic substance
response to osmotic stress
response to cadmium ion
cellular metabolic process
response to metal ion
protein complex assembly
protein complex biogenesis
lipid metabolic process
organic acid catabolic process
carboxylic acid catabolic process
response to stress
sulfur compound biosynthetic process
COSII markers represent an important functional genomics resource that has greatly improved comparative mapping in Asterid species. They can be used to design primer sequences for cleaved amplified polymorphic sequence (CAPS) useful for genetic mapping across diverse taxa, including the Solanaceae. In genetic mapping, the number of markers placed on the map is dependent on the number of polymorphisms between the parents of the cross. Our initial goal, before the availability of the genome sequence, was to facilitate comparative mapping in the Solanaceae by mapping 300 single-copy COSII in potato, Solanum tuberosum, to a diploid mapping population. However, limitations mostly in the level of polymorphism resulted in the successful genetic mapping of only 208 markers using three different segregating populations. The availability of the potato genome sequence enabled another approach to be taken to investigate the genomic locations of these markers in potato. With the help of BLAST analysis we successfully mapped over 300 orthologous markers in silico and compared their physical location in the reference potato genome to that of the genetic location in a potato cross and in previously published map of tomato. Because we utilized DNA sequences obtained from various Solanum species we were able to sample some of the polymorphism present in these taxa and thereby detect markers that are potentially present in multiple copies. We found that most of the markers are present as single-copy in the reference genome. Low copy number is a required character for markers intended for comparative genetic mapping and phylogenetic analysis. Low-copy sequences generally evolve independently of paralogous sequences and tend to be stable in position and copy number. However, a potential problem is the existence of gene families producing paralogs that can evolve independently  and the fact that some genes characterized as low-copy in some groups can be multiple copy in others. We discovered that very low number of the COS markers tested here (17 out of 354, 4.7%) were designed on genes that were present in multiple copies in potato, thus validating the low-copy number definition of these markers.
In silico mapping using the BLASTn algorithm seems to work well in mapping COS marker sequences into the reference genome. This is because the COS primers have been designed to amplify a PCR fragment in the size range that is suitable for BLAST and they have been tested through rigorous algorithms to target genes that are present in single or low-copy numbers . The BLAST algorithm may result in the identification of paralogous sequences. This is a problem only in the case of incomplete reference sequence dataset or when the target genes belong to gene families. Since our input database is the complete genome sequence of potato and most of the markers resulted in a single hit in the genome it is likely that the genes identified are true orthologs. However, for the sequences resulting in multiple hits, it is necessary to make gene-level comparisons when attempting to distinguish paralogues from orthologs. For the markers that target intronic regions, this may be difficult.
The ontology enrichment analysis showed that no bias was introduced in the COSII-DM list as compared to the original COSII list. In general, both gene lists may have a slight overrepresentation of genes in cellular metabolic process and response to environmental stress, and be related to QTLs and agronomic traits of interest like yield, quality and resistance. Considering COS markers that locate in previously published QTL as candidate genes for a given trait may be difficult because the QTL regions span large parts of the chromosome. However, functional annotations are helpful in narrowing down to some specific candidate genes. Some obviously interesting candidate markers for late blight resistance are C2_At5g51840 (Rar1) and C2_At4g36530 (Cinnamoyl-CoA reductase) in chromosome 11 as well as C2_At4g02600 (MLO1) in chromosome 9. RAR1 is required for the functionality of several R genes , while Cinnamoyl-CoA reductase is the first enzyme on the pathway leading to production of Lignin, which is an important factor in plant defense responses and MLO1 confers broad spectrum mildew resistance in barley . Obvious candidate markers for carotenoid and vitamin C biosynthesis are not that easy to identify from this study. However, the QTL regions for these traits contain a couple of photosynthesis and chloroplast related genes, which is to be expected since carotenoids function in photosynthesis acting as pigments in the light harvesting complexes and vitamin C is just a few biochemical steps away from ‘sugar’ produced by photosynthesis. Carotenoids have two key functions in plants: broaden the light spectrum for light harvesting and protecting the chlorophyll against oxidative damage or excess energy . Overlapping regions for QTL for vitamin C biosynthesis and disease resistance are not surprising since many biological processes are altered in the plant during defense response. For example ascorbic acid content in leaves has been shown to modulate plant defense transcripts  and has been suggested to protect the cells against oxidative stress arising from wounding .
We found only a few COS markers that mapped in unexpected chromosomes. In cases where one copy was detected in the same chromosome as in the genetic map and an additional copy in an alternative locus, it is possible that one of the markers detected originates from a paralog. Often these can be readily detected by choosing the gene hit with the best e-value. The single copy markers that have unexpected locations between physical and genetic maps may be true differences as we are comparing different species (DM = phureja, BCT = berthaultii × tuberosum, PCC1 = paucissectum × chomatophilum, PD = phureja × tuberosum, and finally tomato). Tomato and potato are generally considered to be highly colinear in their gene order [13, 25, 26], and this is true for the majority of the RFLP markers shared by the tomato and potato maps at the SGN website . According to Tanksley et al.,  tomato and potato genomes differ by only five paracentric inversions while these two species differ from pepper and eggplant by many more complex rearrangements, mainly paracentric inversions and translocations [27, 28]. According to the most recent tomato/potato comparison there are nine major inversions and several small ones . Significant conservation is found between distantly related species from the Asterid (Coffea canephora and Solanum sp.) and Rosid (Vitis vinifera) clades, at the genome macrostructure and microstructure levels . A minimum of three (and up to ten) inversions and 11 reciprocal translocations differentiate the tomato genome from that of the last common ancestor of Nicotiana tomentosiformis and N. acuminata.
It is possible that the potato reference sequence may contain small numbers of incorrectly oriented or misplaced scaffolds as well as genes that were not discovered by the gene prediction algorithm used. As seen in this work we found a number of markers that had a high confidence hit in the whole genome sequence, but no gene hit. We ran those genome regions through Softberry gene prediction and were able to identify genes matching the COS marker hit region (results not shown). Further work focusing on the genome regions that from this work show contradictory results may facilitate the refinement of the genome assembly and annotation.
The high degree of conservation of gene order (synteny) in the Solanaceae revealed by cross mapping of homologous gene sequences has provided insights into genome evolution and has enabled the cloning of genes for agronomically important traits [29, 30]. However, when comparing two genetic maps it is necessary to take into account that the number of markers shared by any two maps is rather small, and therefore allows only a limited resolution for comparison. Recent comparisons of physical maps between solanaceous species have allowed for more detailed level of comparison of gene order and orientation [31, 32]. Comparison of orthologous regions shows general colinearity between solanaceous species, but also local breaks due to inversions and/or indels. Also, some of the inconsistencies in sequential ordering may well be artifacts since both the potato and the tomato genome still contain scaffolds that could not be oriented. Our results may help to refine the assembly and annotation of the potato and tomato genome.
The distances between markers on a genetic linkage map are based on the proportion of recombination events occurring within a given chromosome segment and thus indicative of gene order at a much lower resolution than physical map distances, which are the actual nucleotide sequence based distances. The sequence-based physical map becomes helpful in identification of markers near traits of interest and thereby reducing the number of markers to be tested in developing applications such as marker assisted selection, diversity assessment, and phylogeny.
The COS markers studied are mostly present as single copies in the reference potato genome sequence, making them ideal for applications such as diversity and phylogenetic studies. In silico mapping is complementary to genetic mapping and facilitates detailed marker identification for traits of interest.
Parents of the BCT , PCC1 , PD , the DM/DI//DI (developed at CIP and contributed to the Potato Genome Sequencing Consortium for anchoring of the DM potato genome , and tomato mapping populations  were subjected to COS marker amplification intended for DNA sequencing. The progeny from BCT backcross population (M200-30 (USW2230 × PI473331) × HH1-9) involving Solanum berthaultii and S. tuberosum, PCC1  and PD  were used for genetic mapping. In addition, COS were amplified from other asterid species Ipomoea trifida genotypes M9 (CIP107665.9) and M19 (CIP 107665.19), and Daucus carota genotypes QAL and 0493B  for cross species comparisons. Leaf tissue was ground in liquid nitrogen and genomic DNA was extracted using standard protocol .
COS markers were selected comparing the published genetic maps with the tomato COS map  and selecting markers that located in the QTL intervals for late blight resistance and/or maturity [16, 17, 37–44], ascorbic acid biosynthesis  and carotenoid biosynthesis  (Additional file 1: Table S1). In addition markers with annotations to genes known to have function in abiotic and biotic stress were selected.
COS markers were amplified from genomic DNA and the optimal annealing temperature for each primer pair was determined using temperature gradient. PCR reactions were conducted with 25 ng of DNA in a 1× PCR buffer (10 mM tris HCl, pH 8.3, 50 mM KCl, 1.5 mM MgCl2, 0.1% Triton-X), 0.2 mM of each dNTP, 0.2 mM of each primer forward and reverse and 0.5 U of Taq polymerase. Reactions were set up in microplates and processed in an MJ Research model PTC-200 PCR thermocycler with the following cycles: 1 cycle at 94°C for 4 min, 35 cycles at 94°C for 1 min plus 55 or 60°C for 1 min plus 72°C for 1 min, and 1 cycle at 72°C for 5 min. The bands were separated by SSCP (single-stranded conformation polymorphism) electrophoresis using 6% denatured (7M urea) polyacrylamide (19:1) and visualized by silver staining. All well-separated bands were cut from the gels with a razor blade. The excised gel slices were placed on 96-well PCR plates, and the DNA was eluted in 40 uL of sterile nuclease free water. This was used as a template in a new PCR reaction with the same primers in a 10 uL reaction.
One μL of this product was sequenced with the same primers in a 5 μL reaction using the ABI Big Dye dideoxynucelotide termination kit (Applied Biosystems, Foster City, California). Amplifications were carried out in an MJ Research DNA Engine Dyad® Peltier Thermal Cycler (Watertown, Massachusetts) using an initial denaturation at 95°C for 3 min, followed by 30 cycles of 96°C for 25 s, 50°C for 20 s, 60°C for 5 min and with a final elongation at 72°C for 7 min. Excess of dye terminators were removed using CleanSeq magnetic bead sequencing reaction clean up kit from Agencourt Biosciences (Beverly, MA). Sequences were resolved on an ABI 3730xl capillary-based automated DNA sequencer (Applied Biosystems) with 50 cm POP-7 polymer capillaries at the Biotechnology Center of the University of Wisconsin-Madison. Alternatively, for some of the markers the PCR products were isolated and purified with Qiaquick Gel Extraction kit and sequenced without the previous re-amplification step.
Publicly available sequence files and other data of potato S. tuberosum Group Phureja DM1-3 516R44 (CIP801092) generated by the Potato Genome Sequencing Consortium were obtained from . We used the v3 superscaffold sequences, v2.1.10 AGP Pseudomolecule Sequences, 3 DM Pseudomolecule AGP data (v2.1.10), v3.4 gene sequences, and v3.4 cds. Tomato genome sequences were obtained from . We used the ITAG1 release cds and genomic sequences.
The names and accession codes of the COS marker DNA sequence libraries deposited in the NCBI GenBank GSS database
Daucus carota (0493B) genomic DNA extraction from leaf tissue
Daucus carota (QAL) genomic DNA extraction from leaf tissue
Ipomoea trifida (M19) genomic DNA extraction from leaf tissue
Ipomoea trifida (M9) genomic DNA extraction from leaf tissue
Solanum chomatophilum (PI310991-1) genomic DNA extraction from leaf tissue
Solanum hybrid (MP1-8) genomic DNA extraction from leaf tissue
Solanum hybrid (M200-30) genomic DNA extraction from leaf tissue
Solanum lycopersicoides (LA2951) genomic DNA extraction from leaf tissue
Solanum sitiens (LA1974) genomic DNA extraction from leaf tissue
Solanum tuberosum (PS-3) genomic DNA extraction from leaf tissue
Solanum tuberosum (HH1-9) genomic DNA extraction from leaf tissue
Solanum tuberosum (CHS-625) genomic DNA extraction from leaf tissue
Solanum tubersoum (DI) genomic DNA extraction from leaf tissue
Solanum tuberosum (DMDI) genomic DNA extraction from leaf tissue
Three diploid mapping populations BCT , PCC1  and PD  were used for segregation analysis to locate COS in potato linkage groups. Polymorphisms were detected by high resolution melting (HRM), , SSCP followed by silver staining or by agarose gel electrophoresis. For HRM the PCR amplification was performed with the fluorescent DNA-binding dye (LCGreen) and the DNA melting profiles were analyzed by LightScanner instrument (Idaho Technologies). Melting curves were analyzed with the help of the LighScanner software and converted into appropriate segregation codes. For the gel separated markers, polymorphic marker alleles were recorded considering presence and absence.
The band and HRM records were compiled according to the genotype codes of population type CP described in the Joinmap® 4 manual . A consensus map was constructed with Kosambi’s mapping function following Joinmap® 4 manual .
A comparative COSII map between the integrated potato genetic map, the potato physical map and the tomato genetic map was made as described in the legend to Figure 1. The figure was prepared using the genoPlotR library  for the statistical software R .
We ran a BLASTn against the DM genes and coding sequences provided by PGSC and the tomato genomic and coding sequences using our marker DNA sequences as queries. The marker sequences and the corresponding gene or coding sequences were aligned as DNA or translated amino acid sequences depending on whether the marker sequence obtained was covering intron or exon regions of the genes analyzed. The alignments were made using ClustalW and Neighbor Joining (NJ) trees were constructed using the Poisson correction method for amino acid sequences and the Maximum Composite Likelihood method for DNA sequences. Evolutionary analyses were conducted in MEGA5 .
In the initial phase of the project the list of ontology terms associated with the 2868 COSII markers was manually reviewed and filtered for genes with gene ontology annotations that may have a role in traits of interest like stress tolerance and late blight resistance. For the final analysis, other criteria included single-copy status, and mapped in DM/DI//DI. This final list of 273 markers (further referred to as COSII-DM list) was subjected to the ‘Singular Enrichment Analysis’ tool as available on the AgriGO web-site . The method tests if particular terms are over-represented or different in the set of interest against a reference list. We tested if the COSII-DM list was different from the original COSII list and versus the Arabidopsis gene model (TAIR9) as available on the AgriGO web-site. The focus of interest for the term analysis was on GO terms within the ‘biological process’ category.
This work was supported by the USDA National Research Initiative grant number 2008-35300-18669) to DS, MB, and LM.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.