In this study we reported on a set of 313 intron-flanking gene based markers, specifically based on genes mainly involved in the nodulation pathway in legumes. These markers were evaluated using SSCPs and an allele specific high throughput Sequenom platform. This means that the marker assisted selection community now has two different technologies to further exploit our new resource of molecular markers available. Similar intron-flanking markers have been designed for comparative genomics in other legumes, based on conserved orthologous sequences (COS) [11, 34]. In grasses, intron-flanking markers have been evaluated in relation with inter-species diversity and candidate genes within QTLs [35, 36].
In terms of linkage analysis, 17% of the SNP markers were placed in the inter gene pool population DOR364 × G19833. In order to identify the putative position of the other SNPs, the linkage map was merged on a consensus map following the methodology reported by Galeano et al. . The synteny analysis allowed in silico mapping of the rest of the markers. The consensus map traditionally presents high degree of co-linearity and synteny, and therefore it has become a popular alternative for in silico mapping and for association studies in other species, like Eucalyptus and wheat (Triticum spp.) .
The diversity analysis using intron-based SNPs revealed different patterns of diversity compared with the ones described by Blair et al.  using SSRs. This may be a consequence of the dissimilar mutation processes that are associated with each type of marker . Therefore, according to Laval et al. , (k-1) times more biallelic markers are needed to achieve the same genetic distance accuracy as a set of SSR with k alleles. In our case, the average number of alleles per SSR locus was about 10. Therefore, we would require [(10–1) * 37] = 333 SNP markers to achieve the same accuracy. In addition, the polymorphism within the intron-based markers could be constrained more extensively than the polymorphism within non-genic regions. Similar results were reported by Cortes et al. , where the SNPs were able to differentiate between the Mesoamerican and the Andean gene pools, but the SSRs were more powerful for the identification of races within gene pools. Therefore, it was proposed to use SNP markers at the inter-gene pool level and SSR markers at the intra-gene pool scale in order to explore the diversification and domestication history of the species. In maize, Hamblin et al.  reported that SSRs performed better at clustering germplasm and provided more resolution than SNPs, something that has been observed in this study for the case of common bean, as well. Additionally, Jones et al.  compared SSRs and SNPs in maize and showed that SNPs can provide more high-quality markers. They suggested that the relative loss in polymorphism compared with SSRs may be compensated by increasing the numbers of SNPs and using SNP haplotypes. Our combination of multiple markers from the same gene and from different genes allowed us to detect the corresponding haplotype blocks, and therefore support this thesis. In short, our results are in line with previous evidence that supports the hypothesis according to which SNPs and SSRs are complementary, non-mutually exclusive, markers that must be chosen based on the ultimate practical purpose. In this sense, we emphasize that the use of one or the other marker does not only depend on the level at which the comparisons will be made, but also on the nature of the comparisons.
Population structure analysis is a key factor for association analysis in plants, in order to minimize type I and II errors between candidate molecular markers and traits of interest . In common bean, the diversification across the Americas and the independent domestication of the wild relatives in two distinct centers gave origin to two main gene pools, the Andean and the Mesoamerican, with extensive race sub-division. Several studies have reported that the Andean beans are more diverse than the Mesoamerican ones [13, 32].
Similar trends are theoretically expected in terms of linkage disequilibrium. In the current study, the level of LD in the Andean panel was slightly higher than what previous analyses revealed using AFLP screenings of wild and domesticated accessions . This difference is mainly due to the type of markers and the sample size that were used in each case. Rossi et al.  additionally reported higher levels of LD in the Andean gene pool, compared with the Mesoamerican, suggesting that the former originated prior domestication. Analogous correlations between population sub-division and LD decay have been found between tropical and temperate germplasm in maize , among O. sativa ssp. indica and O. sativa ssp. japonica, and between two-row and six-row barley . In short, the Andean gene pool offers per se an interesting spectrum to look for adaptive variation, at the same time that the confusing effect of sub-structure is minimized.
A recurring issue with the use of QTL data is that different parental combinations or/and experiments conducted in distinct environments often result in the identification of partly or wholly non-overlapping sets of QTLs . Therefore, it is important to explore constitutive QTLs across different environments. In this sense, our field trials offered us the possibility to identify constitutive marker-trait associations because correlations were contrasted across two environments, drought and irrigation. This sort of designs is particularly useful for marker assisted selection (MAS), as was demonstrated in rice .
In terms of association mapping models, we used two approaches: GLM and MLM. The GLM presented more significant p values and therefore more associations. However, after Bonferroni correction just two markers were detected in common with the MLM results. This finding is in accordance with the results of previous studies [49, 50] and indicates that the GLM approach is inappropriate for association mapping in the examined plant species, because the resulting proportion of spurious marker-phenotype associations is considerably higher than the nominal type I error rate. The MLM used here, using as co-factors the kinship matrix (K) and STRUCTURE (Q), revealed interesting results. However, recent studies reported that new models combining K and the 10 principal components (Q10) were the best approaches to control the rate of false positives [51, 52]. Additionally, although we found some significant association based on high p value using MLM, multiple testing needs to be used to control the genome-wide type I error rate (GWER) .
Interestingly, the markers BSn66_SNP2 and BM143 were near previous QTL analyses for days to flowering and days to maturity, in different bi-parental populations nearby or flanking the same loci in the same linkage group [54–57]. Additionally, QTLs for yield components such as seed weight and seed per pod have also been reported close to these loci [55, 58, 59]. In terms of functional genomics, the locus BSn66 is an auxin response factor 2 (ARF2), one member of the family of transcription factors that bind to auxin responsive elements (AuxREs) in the promoter sequences of auxin regulated genes . The ARF gene family has been repeatedly associated with flower and fruit maturation and development [61–63]. For instance, the arf2 mutants presented enlarged rosette leaves, thickened inflorescence stems, delayed flowering and senescence, reduced fertility and increased seed size [64, 65].
In a similar way, SNP marker BSn85_SNP2 on Pv8 is near QTLs for days to maturity and in addition seed weight has been reported nearby this locus [55, 56]. The locus BSn85 putatively codifies a basic helix-loop-helix (bHLH) transcription factor. Members of the bHLH gene family are particularly relevant because they interact with the light-activated phytochrome, and therefore control various facets of the photomorphogenic response, including seed germination, seedling deetiolation, shade avoidance and photoperiodic control of plant growth [66, 67]. Recently, the interaction of ARF with bHLH transcription factors has been reported in the context of plant growth . These examples of functional congruence and co-localization of some of the associated loci with formerly identified QTLs validate our approach. Even more interesting is the fact that association studies in common bean, specifically within the Andean gene pool, are an excellent alternative to find QTLs based on candidate genes. Pioneer association results in common bean were obtained for SNP markers associated with common bacterial blight (CBB) resistance .
Although the sampling in our study was not exhaustive, similar successful studies with small sample sizes have been reported extensively. For example, several SNP markers were associated with oleic acid using 94 genotypes of peanut (Arachis hypogaea) from 4 botanical varieties , and makers associated with malting quality where found in barley using germplasm sets of 85 genotypes on average . The main advantage of the small, carefully chosen, association mapping panels is the efficacy and affordability with which plant germplasm is used. In some other cases, like in barley, more individuals (approximately 300 lines) are desired [46, 70]. However, the final choice of the size of the population depends on the relatedness of the individuals, the extent of linkage disequilibrium, the type of study, and the polymorphism of the markers. We have demonstrated that because of its self-crossing nature, common bean is not really demanding in this aspect, and allows working with medium size populations.
Additionally, considering the population size and low genome coverage, the parental information of the lines will improve the accuracy of the results. This approach has been used particularly in livestock species, with models that integrate data on phenotypes, genotypes and pedigree information. Such information can be combined with genomic data for greater detection power and estimation precision through a properly scaled and augmented relationship matrix . Therefore, this parental information will be very important for association and genome selection approaches in common bean. Unfortunately, at this stage parental information was not available for the materials considered in the present study because they were landrace genotypes collected from farmers and market places.