In the present work, the distribution of repetitive sequences, or microsatellites, formed by either one or more base pairs of longer than six units, has been studied to develop a broad set of useful SSR markers for the common bean derived from a public EST database. The goal was to attribute a genetic value to the EST-SSRs to increase their potential use. In addition, the relatively low cost to obtain these markers, when compared to the development of genomic libraries, is an attractive choice. This choice is evident mainly in species with a narrow genetic base for which a high number of SSRs is necessary to detect polymorphism. Currently, there are about 36,626 sequences in the Phaseolus EST data set, which accounts for an increase of 31% in the last six months and represents a new potential source of SSR markers for the common bean. Similar approaches for SSR mining in EST databases have been applied for the common bean [13, 15] as were used for the other related species of the Legume family [6, 32]. From our data mining, the criteria established for SSR identification was unusual and less stringent, leading to the identification of mononucleotide and compound repeats. As a consequence, mononucleotides were the most frequent, followed by di- and trinucleotides. This result is in contrast with the previous observations for several species , but it is consistent with the observations  for dicotyledons. The composite motifs accounted for 30% of the SSRs identified, considering dinucleotide repeats or higher motifs, which was similar to the findings reported by  that identified 34% of such motifs. A reduced set of available markers would result from screening the non-redundant Phaseolus ESTs for SSRs, using the parameters of dinucleotide or more repeats, limiting the length of SSRs and keeping only the perfect repeats. Consequently, only 3.27% of the sequences containing SSRs would be adequate for primer design, which translates to 156 EST-SSRs. Indeed, in Arabidopsis and pepper, very close estimates were obtained when the criteria of stringency was increased, resulting in 3% and 3.6% of ESTs containing SSRs, respectively [33, 35].
Although most markers were based on mononucleotide repeats, they showed satisfactory levels of PCR amplification that were even higher than those derived from the genomic sequences of common beans. The search for these repeats increased the number of SSR markers available for genetic analysis with a broad spectrum of applications. The rates of SSR amplification normally show a wide range in plants, such as those reported for barley and tomato with 64% and 83%, respectively [36, 37]. In the present work, 80% of the EST-SSRs were amplified. Similar levels of EST-SSR amplification have been reported for the common bean by  and . These reports also found successful amplification in 81% of the SSRs derived from the public database GenBank. A slightly higher value of 87% was reported by  using EST-SSRs derived from a private database. These values are comparable to those reported for SSRs derived from genomic libraries, which are described by  and , who found amplification rates of 86% and 81%, respectively. The high rate of success in the amplification of EST-SSRs in the common bean may be the result of several factors, such as the quality of the sequences from which the primers were derived, the adequate criteria used for primer design and the use of the same species for the design and amplification of the primer set. Although EST-derived SSR markers are generally less polymorphic than genomic SSRs, the value of EST-SSRs when compared to genomic SSRs is enhanced by several factors. These factors include their level of transferability, their potential to attribute function to genes affecting traits of interest and the readiness in the identification of SSRs by in silico data mining with reduced time, labor and cost.
Not surprisingly, the results presented in this work indicate that the EST-SSRs showed higher rates of transferability across the Legume species than the genomic SSRs. A total of 93.8% of the EST-SSRs were transferable for at least one species as compared with 74.5% of the genomic SSRs. The higher transferability rate of EST-SSRs can be explained by their correspondence to the transcribed component of a gene unit, which confers a high potential for inter-specific transferability . The recent increase in the availability of the EST-SSRs derived from public databases can be expected to provide an additional source of transferable markers among less related species. Transferability of EST-SSRs has been reported for several species [37, 40]. Transferability across the Legume family was lower than within Phaseolus genus, in which 64% of the markers produced amplification products in all tested species. These findings are in accordance with the report by , which described that a high degree of genome conservation has been identified between the model legumes Medicago truncatula and Lotus japonicus. However, genome conservation tends to be reduced as we move to the Phaseoloid clades, such as soybean (G. max), common bean (P. vulgaris) and Vigna (including cowpea and Asian Vigna). Generally, successful cross-amplification between the genera appears to be lower than within the genera. In a study with soybean,  found an amplification rate that ranged from 3% to 13% among the Legume genera, whereas for species within the genus, the level of transferability was up to 65%. As for the Medicago genus, a rate of 81% of the SSR markers tested was found to be transferable among M. sativa and M. truncatula . Within the Vigna genus,  reported levels of SSR transferability that reached 90% among V. umbellate and V. angularis, whereas for the species V. mungo, V. radiata and V. aconitifolia, the amplification decreased to rates of 67% and 73%. In our study, the transferability of SSR markers between P. vulgaris and G. max was 10%, whereas most of the transferred markers (70%) were EST-SSRs. Because these two species are considered the most important cultivated legume in the world with a wide volume of available genetic linkage map information, the transferability of EST-SSRs may prove very useful for studies involving comparative genomics because of ability for information exchange.
Although the conserved nature of EST-SSRs facilitated transferability, these markers are considered less polymorphic than other sources of SSRs. In the current study, the level of polymorphism of the EST-SSRs (0.47) was slightly lower than that of SSRs derived from genomic libraries (0.53), with a mean number of three and four alleles, respectively. Similar results were found by  who obtained a PIC value of 0.44 and an average number of 2.7 alleles for EST-SSRs and a PIC value of 0.45 and an average number of 2.4 alleles per locus for genomic markers. High levels of polymorphism for genomic and EST-SSRs have been reported, where these markers were associated with an average number of 6 and 9.2 alleles per primer pair, respectively . In the present study, 34 of the 167 loci were previously characterized using a more diverse set of sampled individuals, including accessions from Andean and Mesoamerican gene pools [14, 28], to determine genetic diversity. The average PIC values, which reflect the allelic diversity and frequencies among the sampled individuals, observed for these markers were higher (0.70) than those found in the present work (0.50); these data provides strong evidence that the number and genetic relationship of the individuals used to access SSR genetic information could influence the estimates obtained. The correlation between PIC values and the index of transferability of the SSR markers was not observed. The loci that showed a greater rate of transferability were identified as monomorphic or had low PIC values. These findings are consistent with the theory that conserved genomic sequences are less variable.
Despite the reduced level of polymorphism rates found for the EST-SSRs, these markers were very useful for the genetic mapping of the BJ population, helping to increase the map coverage in the Phaseolus genome. Across the entire set of EST-SSRs tested for polymorphism in BAT93 and JALO EEP558, 24% segregated and were integrated into the BJ framework. Previous data indicated that genomic SSRs were almost two-fold more polymorphic than the EST-SSRs . In a similar study, also using the BJ population, the polymorphism level of the EST-SSRs was significantly higher (51%) than the EST-SSR markers developed in our study . The level of polymorphism of the EST-SSR markers could be attributed to the length and the number of repeat units and the SSR position inside the transcribed sequence. There is currently a great volume of genetic information derived from genome sequencing, high coverage genetic maps, BAC libraries and physical maps . This increase in genetic information has allowed for a better understanding and has provided new insights into the elucidation of mechanisms involved in the expression of target traits and the transfer of knowledge from one species to another. Comparative genomics to assess synteny can facilitate the reciprocal use of genomics resources between different legume species, making the research cost-effective, efficient and useful for crop breeding .
Regarding the segregation distortion, 12 markers (15.9%) out of the 72 polymorphic EST-SSRs showed a significant deviation from the expected Mendelian segregation as shown by the FDR analysis (p < 0.05). The nature of the EST-SSRs could explain the high level of segregation distortion found in our study. A more extreme deviation in coding regions is expected because they are more susceptible to evolutionary pressures than non-coding regions. These markers were preferentially mapped on chromosomes 1 and 7 with a clear distinction between the markers that showed segregation distortions towards BAT93 or Jalo EEP558. The markers that were skewed towards the parent BAT93 were mapped on chromosome 7, whereas the markers that were skewed towards Jalo EEP558 were mapped on chromosome 1. The same pattern was observed in two previous studies using the same BJ mapping population [29, 44]. The genes related to the domestication syndrome were mapped on chromosome 2 in two different studies [45, 46]. This result establishes the hypothesis that the segregation distortion may have a biological basis on these chromosomes, and the parents, BAT93 and Jalo EEP558, may have suffered preferential selection on the specific regions of chromosomes 1 and 7 that resulted in the observed segregation distortion .
In our study, all markers were placed into 14 linkage groups, which exceeded the number of common bean chromosomes by three (n = 11). An increase in the number of markers and population size, which is currently formed by 76 individuals, would allow these three small linkage groups to integrate into their respective chromosomes. The total SSR map length (1,156.2 cM) was consistent with previously developed maps. Freyre et al. 1998  reported a total map distance of 1,226 cM using the same BJ mapping population, and  reported a map size of 1,401 cM using a mapping population derived from the parents "Carioca" and "Guanajuato 31". However, an increase of almost 50% was observed in the SSR map length when compared to the SSR based map developed by  (606.8 cM). Because the sample size and mapping generation were the same, this variation in the estimation of the genome size can obviously be attributed to the higher marker density of the current map. In addition,  suggests that the EST-SSR-based linkage maps are expected to be larger than those based solely on anonymous SSRs because the recombination event may be more frequent in gene-rich regions than in non-coding regions. The genomic distribution of EST-SSR markers in this study was random and quite similar to the genomic SSR markers, providing a good coverage of all linkage groups.
A total of 207 SSR markers have been mapped on the BJ population to date, including 15 mapped by , 22 by , 94 by  and a new set of 76 SSRs in the present study, of which 74 were EST-derived markers. An additional effort was made in the last two years by Embrapa Rice and Beans to increase and make available the number of RILs derived form BJ population, which presently has 75 individuals , to achieve higher precision in the clustering and order analysis. In addition, the number of mapped markers tends to rapidly increase due to the continuously growing number of EST sequences that are becoming available for the common bean on public databases; this increased availability will contribute to the development of more EST-SSR-based markers and to the construction of more informative linkage maps. EST-SSR markers have the potential to greatly increase the degree of information provided by linkage maps because they can be readily associated with genes of known or putative functions, allowing for the direct association of markers in the map with quantitative traits of interest. Our results show that more than half (57.9%) of the EST-SSRs sequences were associated with sequences of putative genes in the GenBank, demonstrating the potential of these markers to be used for the genomic exploitation of the common bean. In addition, we demonstrated that EST-SSRs can be easily transferable among the Legume genus with levels of genetic information content comparable to those of genomic SSRs, which will contribute to expand their use in genetic analyses of Phaseolus vulgaris.