Peach genetic resources: diversity, population structure and linkage disequilibrium

Background Peach (Prunus persica (L.) Batsch) is one of the most important model fruits in the Rosaceae family. Native to the west of China, where peach has been domesticated for more than 4,000 years, its cultivation spread from China to Persia, Mediterranean countries and to America. Chinese peach has had a major impact on international peach breeding programs due to its high genetic diversity. In this research, we used 48 highly polymorphic SSRs, distributed over the peach genome, to investigate the difference in genetic diversity, and linkage disequilibrium (LD) among Chinese cultivars, and North American and European cultivars, and the evolution of current peach cultivars. Results In total, 588 alleles were obtained with 48 SSRs on 653 peach accessions, giving an average of 12.25 alleles per locus. In general, the average value of observed heterozygosity (0.47) was lower than the expected heterozygosity (0.60). The separate analysis of groups of accessions according to their origin or reproductive strategies showed greater variability in Oriental cultivars, mainly due to the high level of heterozygosity in Chinese landraces. Genetic distance analysis clustered the cultivars into two main groups: one included four wild related Prunus, and the other included most of the Oriental and Occidental landraces and breeding cultivars. STRUCTURE analysis assigned 469 accessions to three subpopulations: Oriental (234), Occidental (174), and Landraces (61). Nested STRUCTURE analysis divided the Oriental subpopulation into two different subpopulations: ‘Yu Lu’ and ‘Hakuho’. The Occidental breeding subpopulation was also subdivided into nectarine and peach subpopulations. Linkage disequilibrium (LD) analysis in each of these subpopulations showed that the percentage of linked (r2 > 0.1) intra-chromosome comparisons ranged between 14% and 47%. LD decayed faster in Oriental (1,196 Kbp) than in Occidental (2,687 Kbp) samples. In the ‘Yu Lu’ subpopulation there was considerable LD extension while no variation of LD with physical distance was observed in the landraces. From the first STRUCTURE result, LG1 had the greatest proportion of alleles in LD within all three subpopulations. Conclusions Our study demonstrates a high level of genetic diversity and relatively fast decay of LD in the Oriental peach breeding program. Inclusion of Chinese landraces will have a greater effect on increasing genetic diversity in Occidental breeding programs. Fingerprinting with genotype data for all 658 cultivars will be used for accession management in different germplasms. A higher density of markers are needed for association mapping in Oriental germplasm due to the low extension of LD. Population structure and evaluation of LD provides valuable information for GWAS experiment design in peach.


Background
Peach (Prunus persica (L) Batsch) is one of the most predominant commercially grown stone fruits in the Rosaceae family, subfamily Spiroideae because of its broad climate adaptation and high production in cultivation regions [1]. Its short juvenile period (2-3 years) and the ease of obtaining controlled crosses have made peach breeding programs quite successful: around 1,000 new cultivars were released during 1991-2001 [2]. In addition, because of its small genome size and the simple genetic basis of many morphological and economical traits [3], peach is a model fruit crop for traditional genetics and current genomics research, with subsequent applications in breeding and selection.
Being the centre of origin of peach, China has the longest history of peach cultivation (more than 4,000 years), and the richness of genetically diverse germplasm can provide useful genes to breed cultivars with enhanced resistance to pests and diseases, improved fruit size and quality, and a longer postharvest shelf-life. The ancestral form peach used as rootstock in south China still exists. Other wild related species are present in the northwestern region of China: 'P. mira Koehne', 'P. kansuensis. Rehd', 'P. davidiana. Franch' and 'P. potaninii Batal'. In China, the main peach germplasms are in three national collections, but regional and local collections are also established around the country. The national collections preserve 2,000 accessions from China and foreign countries, with about 600 cultivars of local origin [4]. Based on genetic fingerprint data, Chinese peach cultivars have more genetic diversity than has been reported for other peach germplasm collections [5]. The Chinese peach germplasm has had a great impact on breeding research in other countries. After introducing 'Shanghai Shui Mi' as parents in the early 20 th century, Japan selected out 'Hakuto' [6,7] and the USA released the famous cultivar 'Elberta'. Both 'Hakuto' and 'Elberta' were extensively used as parents for further breeding of modern cultivars [8,9]. Over the last few decades, considerable effort has been put into peach breeding in the USA, South Africa, Brazil, Argentina, Australia, China, Spain, Italy, France and Japan [10], producing almost 2,000 new cultivars; half of these have been registered in and come from the USA while only 5% are from China [11,12].
As a self-pollinated species, peach retains a high degree of self-compatibility and homozygosity [13]. During the decade 1991-2001, peach and nectarine cultivars were generated through controlled crosses (43-61%), open pollination (15-21%) and bud mutation (4-5%), and the outcrossing range varied from 15 to 30% [14]. Most local Spanish varieties were self propagated; melting cultivars were usually produced by crossing two individuals and selecting from their progeny, and non-melting peaches were selected from seed-propagated populations [15]. Chinese breeding cultivars were mainly released using 'Shanghai Shui Mi'('Chinese Cling') and 'Bai Hua Shui Mi' as founders. 'Okubo' and 'Hakuto' from Japan were inter-crossed to produce white and low-acid peaches in Nanjing and Beijing germplasms. 'NJN76', 'Mayfire' and 'Legrant' were also introduced and inter-crossed to produce nectarines. Chinese landrace reproduction was mainly based on seed propagation [4]. Most of the Japanese peaches were selections or mutations of 'Hakuto' and 'Hakuho' [6,7]. New genetic backgrounds should be explored and introduced in peach breeding programs to overcome the narrow genetic background resulting from the use of few founders [15][16][17][18][19][20].
Peach genetics and genomics studies have provided tools for marker-assisted selection (MAS). Microsatellites and simple sequence repeat markers (SSRs) have proved to be a very efficient way to evaluate genetic relationships between individuals, marker-assisted selections and for population genetics studies in Prunus species [15][16][17][18][19][20]. Today, approximately 500 SSRs have been mapped in the reference map (T × E), and more microsatellites are available from the complete peach genome sequence data produced by the International Peach Genome Initiative (IPGI) [21] (www.rosaceae.org/species/prunus_persica/genome_v1.0).
Linkage disequilibrium (LD) mapping (also known as association mapping) is a gene mapping tool which relies on the association between molecular markers and phenotypic traits in populations of unrelated individuals; crosses are not always available or easy to obtain. The extent of linkage disequilibrium around a gene has major implications in association mapping, since it determines the effectiveness of this approach [22]. Low LD implies a high number of markers, whereas very high LD extension means low mapping resolution [23]. A high level of information on LD patterns in the working species, in our case peach, is needed for any further association mapping studies. Whole-genome LD can be confusing if the sample is structured into subpopulations (also known as population stratification), i.e. when two (or a group of ) accessions have a higher probability of sharing the same allele due to their origin (geography, breeding program, etc.) [24]. Different factors can increase the level of LD: small population size, inbreeding, genetic isolation between lineages, population subdivision, low recombination rate, population admixture, genetic drift and epistasis. In contrast, outcrossing, high recombination rate, high mutation rate and gene conversion can decrease the LD [22,24]. The amount, extent and distribution of LD have been well described for common human diseases. In plants, LD has also been investigated in maize, barley, ryegrass, wheat, soybean, sugarcane, grapevine and peach, to design association-mapping experiments and infer the evolution of species [25][26][27].
Up to now, variability and LD analyses have been reported separately in Occidental and Oriental peach collections [15,28]. High levels of LD, extending to 13-15 cM, have been reported in American and European peach accessions, and the development of the wholegenome scanning approach for genetic studies has also been raised [15]. The LD level in Chinese landraces has been reported to span 6.01 cM [28]. So far there has been no report on the comparison of genetic diversity, population structure and LD between Oriental and Occidental accessions using the same markers and analytical methods. Moreover, although several reports from China have dealt with it using a limited number of accessions and SSR markers [18,19], a complete picture of Oriental peach genetic diversity and population structure is lacking.
In this research, we investigated the genetic diversity, population structure and linkage disequilibrium of a large group of heterogeneous samples of Oriental accessions and integrated the data with that obtained in [15] to analyse, jointly, 653 peach accessions. We used this comparison between Oriental and Occidental germplasms to infer how genetic diversity can be increased by combining both sets of collections, and provide guidance for introducing accessions into different germplasms.

Genetic diversity of the accessions
The 48 SSRs selected in this research were polymorphic in both Oriental and Occidental samples, amplifying a total of 588 alleles (Table 1), with an average of 12.25 alleles per locus. The frequency of most of the alleles (435, 73.9%) was less than 5%, and the frequency of 114 alleles was less than 1%. Low allele frequencies resulted in a low effective number of alleles (2.93). The observed heterozygosity (Ho) ranged from 0.13 (PMS02) to 0.63 (UDP96-005), with an average of 0.47. These were lower than the expected values (He), which ranged from 0.17 (PMS02) to 0.85 (BPPCT006), an average of 0.60. Consequently, Wright's fixation indices (F) were positive. As expected, loci were highly informative: the highest power of discrimination (PD) between two random cultivars was observed in BPPCT006 (PD = 0.95), and the lowest in PMS02 (PD = 0.28). The number of genotypes for each locus varied from seven (pchgms1) to 65 (BPPCT006).
The sample of accessions studied was highly heterogeneous, covering different geographic regions (roughly split as Oriental and Occidental) as well as different degrees of domestication (wild, landraces and breeding). In order to explore and compare the variability inherent in such heterogeneity, variability parameters were calculated in 12 sample subdivisions (Table 2). Twice the number of alleles (12 versus 6) was amplified in the Oriental group than in the Occidental group. Due to the large differences in sample size, the evaluation of the mean number of observed alleles in 4 groups was plotted, with increasing sample size. Figure 1 demonstrates that the Oriental accessions (with number of observed alleles close to the mean on the standard curve) contributed more than the Occidental accessions (with number of alleles below the 95% CI of the distribution) to the variability of the whole collection. The observed heterozygosity was higher in the Oriental group than that in the Occidental group (0.53 versus 0.35), while the deviation from the expected heterozygosity was lower than in the Occidental samples, within which most samples come from breeding programs and directed crosses, 0.53 vs 0.61 (Ho vs He) in the Oriental group and 0.35 vs 0.47 in the Occidental group.
With respect to the genetic diversity in the subgroups, 11 alleles were obtained within 353 Chinese accessions, and 5 alleles in 64 non-Chinese (Japanese and Korean) accessions. Observed heterozygosity was similar in both groups. With a further subdivision of the Chinese collection, a higher number of alleles was identified in the landraces (146 accessions) than in cultivars developed in breeding programs (207 accessions). Observed heterozygosis was similar in both groups (0.52 and 0.53 respectively) and lower than the expected, yielding a positive value of F (0.20 and 0.10 respectively).
In the Occidental group, four alleles were amplified in 24 Occidental landraces, and six alleles in 212 Occidental breeding cultivars. Observed heterozygosity was lower in the landraces compared with that in the breeding cultivars.
Similarly, the genetic variability in the analysis was considerably increased after combining Chinese landraces with Occidental breeding cultivars, while the effect of adding Occidental landraces to the Oriental cultivars was not significant. The number of heterozygous loci was higher using Chinese breeding cultivars, despite the lower heterozygosity of the Occidental landraces.

Genetic relationship among the accessions
A phylogenetic dendrogram ( Figure 2, Additional file 1: Figure S1) based on genetic distances clearly divided the 658 accessions into two main groups: G1 and G2. Four wild peach related species fell into the G1 group as an outgroup, including the non-persica accessions 'Gan Su Tao' (P. kansuensis), 'Hong Hua Shan Tao', 'Bai Hua Shan Tao' and 'Shan Tao' (P. davidiana), whilst 'Guang He Tao' (P. mira Koehne) clustered with the remaining peach accessions in G2. G2 contained all the persica accessions. The most genetically distinct accession was 'Hong Ye Tao'. Seven major groups were clustered in G2, assigned 587 accessions, clustered not only according to the pedigree information and eco-geographical origin in the dendrogram, but also consistent with the structure-based membership assignment. The other accessions were clustered into several small groups. The founder cultivar 'Chinese Cling', widely used in European and America breeding programs,

Population structure
According to the Evanno method [29], the collections were mainly divided into three subpopulations (K = 3). CLUMPP alignment of ten independent solutions for K = 3 gave pairwise 'G' values around 0.99, indicating that the assignment of accessions to the subpopulation was well correlated among runs. Considering the membership coefficient Q ≥ 80%, 469 accessions were clustered into three subpopulations (Figure 3a), one with Oriental breeding cultivars (234), one with Occidental breeding cultivars (174) and the third including both Oriental and Occidental landraces (61). The remaining accessions (189, unstructured) could not be assigned under the 80% membership coefficient criteria; almost 75% of them were Oriental and Occidental cultivars, and four of the founders used in the earlier USA breeding programs (' Admiral Dewey', 'Early Crawford', 'Elberta', and 'Chinese Cling') also clustered within this admixed group. The Oriental subpopulation of breeding cultivars was further divided into two groups (Figure 3b), one group including 34 'Yu Lu' derived cultivars, of which 24 were from Zhejiang Province. Another group included 161 cultivars, of which 32 were Japanese cultivars, 59 were Chinese cultivars (all associated with one Japanese cultivar in their pedigree, principally the cultivars 'Okubo', 'Hakuho', 'Sunago wase'), 22 were from the Shanxi collection and the remaining 49 cultivars from different geographic areas. Two founder cultivars, 'Zao Shanghai Shui Mi' (commonly called 'Chinese Cling' in Europe and America) and 'Bai Hua Shui Mi' and its offspring 'Yu Hua Lu', were also assigned to this group.
The Occidental breeding subpopulation (with only the three Chinese cultivars ' Ai Li Hong', ' Ai Li Mi', and 'Le   Based on AMOVA analysis, most variation (68.64%) was detected within individuals, while less, but a significant part of the variation (27.24 6%) was attributed to variation among the five large subpopulations ( Table 3). The overall Fst among the five subpopulations was 0.2723 (p value < 0.05). The pairwise Fst value in this study ranged from 0.20667 (between the 'Hakuho' and 'Yu Lu' subpopulations) to 0.44202 (between the Occidental 'nectarine' and 'Yu Lu' subpopulations). Pairwise Fst values between the two subpopulations within the Oriental and Occidental subpopulations were 0.20667 and 0.21877, respectively (Table 4). Genetic diversity of 469 structured accessions was also confirmed by PCoA (Figure 4). The first 3 axes together accounted for 79.55% of the variation. The first and second coordinates accounted for 44.48% and 21.64% of the molecular variation, with the first coordinate separating Oriental accessions from Occidental accessions, and the second coordinate the landraces from breeding cultivars.

Linkage disequilibrium
The extent of linkage disequilibrium (LD) was evaluated in the seven subpopulations with sample size larger than 20: Oriental, Occidental, Oriental 'Yu Lu', Oriental 'Hakuho', Occidental peaches, Occidental nectarines and, the landraces subpopulations.
A total of 3,148, 2,435 and 5,122 pairs of linked alleles were obtained in the three main subpopulations (Oriental, Occidental and landraces, respectively), within the same linkage group (intra-chromosome), 453 (14%) of the Oriental subpopulation, 333 (14%) of the Occidental and 805 (16%) of the landraces ( Table 5). The percentage of intra-chromosome pair comparisons with significant LD (r 2 > 0.1) was 17% in the Oriental subpopulation, 14% in the Occidental and 7% in the landraces subpopulation. In the Oriental and Occidental subpopulations, this numbers was considerably higher than that observed for interchromosome comparisons (6% and 4%, respectively), while landraces had the same proportion of inter and intrachromosome comparisons in LD.
The proportion of intra-chromosome comparisons in LD was higher in the nested than in the main subpopulations: 20% in Oriental-Hakuho, 47% in the Oriental-Yu Lu peaches, 17% in Occidental-nectarines and 29% in Occidentalpeaches. The proportions of inter-chromosome comparisons in LD were 6%, 36%, 5% and 20%, respectively.
The decay of LD with genetic map distance (Additional file 5: Figure S3) and physical distance ( Figure 5) was calculated for each subpopulation. Figure 5 shows the The LD level was also compared among eight different linkage groups (LG) within the three large Oriental, Occidental and landrace subpopulations, shown in Figure 6. In all subpopulations, the linkage group LG1 had the highest proportion of alleles in LD. Breeding subpopulations (Oriental and Occidental) had the greater proportion of pairs of alleles in LD in LG1, LG2, LG4 and LG7, while a low percentage of linked alleles was observed in LG3, LG5 and LG8. In landraces, linked alleles were observed mainly in LG1 and LG2, while the proportion on other linkage groups was practically negligible.

SSR polymorphism and genetic diversity
Here we studied the variability of a heterogeneous collection of 658 peach genotypes. Close to two thirds of them were of Oriental origin (China, Japan and Korea) and the remainder from Occidental regions (Europe and USA). The sample included cultivars from both Oriental and Occidental breeding programs as well as landraces, wild peaches and other Prunus species closely related to peach. These accessions were analyzed with 48 SSRs in two different laboratories, the use of some common accessions as controls allowed the combination of the two datasets for the joint analysis of the data to identify and compare the variability intrinsic to each collection.
In total, the 48 polymorphic SSRs used amplified an average of 12.25 alleles per locus, 19% of them rare alleles, Figure 2 Neighbour-joining tree for the 658 prunus accessions. The tree was rooted using one wild relative species 'Guang He Tao' (Prunus mira. Koehne.) as outgroup. Bootstrap support values greater than 80% are shown in blue on the branches. Circled numbers beside the tree nodes indicate the 8 major groups. The colored parentheses indicate the clusters inferred by STRUCTURE analysis of 5 populations. The population ID are noted on the right. Accessions in different colors indicate they were assigned to corresponding populations. Unstructured accessions are in red.
which is higher than the values observed in previous studies on genetic diversity in peach [5,[16][17][18][19][20]28,30,31]. These high values can be explained by the large sample set and high heterogeneity of the sample.
The Oriental accessions contributed most to the variability of the sample, especially the landraces which, in general, are considered valuable in germplasm collections as sources of genetic diversity [5]. Some of the landraces studied here came from the north of China, especially the northwest, which is where peach originated. An average of 10 alleles per locus was amplified from 146 Chinese landraces. This is higher than the average of 6.4 previously reported in [28], where 104 landraces were analyzed with 53 SSRs. Some of the landraces have their own gene pools, especially those from Shanxi Province: 'Bai Lu Tao' 'Qiu Fen Tao' 'Taigu Rou Tao' 'Yangqu Bai Tao' 'Jin Qiu' 'Wu Yue Xian' and 'Taiyuan Shui Mi'. Heterozygosity was lower in the Occidental than in the Oriental accessions, in part due to the limited genetic background in American and European peach breeding programs and also the low heterozygosity of the Occidental landraces caused by their self-propagation. This high proportion of heterozygous loci is consistent with [10]. After studying 45 peach cultivars and rootstocks from the USA,  China, France, and Canada, these authors concluded that the highest levels of heterozygosity were detected in those cultivars from China. In summary, from our results, we deduce that it may be a desirable strategy to use Chinese germplasm to increase genetic diversity in Occidental cultivars, while any introgression of new alleles should be carried out without disrupting existing allele combinations associated with superior traits bred into these cultivars.

Phylogenetic clusters
Phylogenetic analysis grouped all accessions into 8 major clusters consistent with geographic origin, domestication history and mating system. The result was also agreement with PCoA and population structure analysis: (1) Oriental accessions were separated from Occidental accessions; (2) breeding cultivars were separated from old landraces; (3) the Oriental-Yu Lu group was separated from the Hakuho group; the Occidental-peach group was separated from the nectarine group.
Collections from Japan and China clustered together and could not be separated clearly, revealing similar origin and genetic background as well as the effect of breeding strategies at the whole genome level [4]. This effect has also been reported in Occidental breeding material [32]. 'Chinese Cling', used as founder in western countries, where it is known as 'Shanghai Shui Mi', grouped with 'Qi Yuan Shui Mi' and 'Ren Pu Shui Mi' which originated from Zhejiang Province. While another founder cultivar 'Zao Shanghai Shui Mi' (called 'Chinese Cling' in Japan) clustered within Japanese peaches. The relatively long genetic distance between 'Chinese Cling' and 'Zao Shanghai Shui Mi' is probably because 'Shanghai Shui Mi' was not a single but a group of cultivars [4]. Two Spanish landraces, the white non-melting peach 'Binaced' and flat white peach 'Paraguayo Delfin', clustered with Chinese peaches. These two cultivars have been studied by different Spanish research groups [15,17,20] and always kept separate from other Spanish materials in breeding. Here the results provide important clues that these two cultivars were probably selected from China or obtained from seed or clonal propagation of a Chinese cultivar. Since the Occidental nectarine cultivars 'Mayfire' and 'NJN76' were widely used as parents in Chinese breeding programs, 52 Chinese nectarines clustered within the Occidental group [5].
Two large Oriental groups ('Yu Lu' and 'Hakuho') can be explained by their pedigree relationship. Mutation and seedling selection was the dominant way to select new cultivars in the 'Yu Lu' subpopulation. Selfing and crossing with ideal parent materials were adopted in the 'Hakuho' subpopulation. Three occidental cultivars, 'Dixon_SX', 'Fantasia_Or' and 'Flavortop_Or', from Shanxi germplasm did not fit in the Occidental group. 'Hong Bao Shi' might have the same identity as 'Red Diamond' with a genetic distance less than 0.05. This information will be useful in germplasm collections to efficiently preserve peach cultivars by identifying and removing redundancies, focusing resources on poorly represented groups or validating the new selected cultivars.

Population structure
Here we applied a nested clustering strategy with the STRUCTURE software. This method has been previously used with large sample sizes, exhibiting a strong capability  to assign individuals into populations [33][34][35][36]. For example, 566 South-American Solanum section Petota produced an optimal partitioning into 44 groups with this two-step method [34]. In Pisum, three subpopulations, corresponding approximately to landrace, cultivar and wild Pisum, were obtained in the first STRUCTURE step, and 14 sub-subpopulations were obtained through the second STRUCTURE step, which correlated with the taxonomic sub-division of Pisum according to phenotypic traits and/or geographical origin [35]. Likewise, a deep division has been observed in Northeast Spanish apple accessions using this method, identifying two robust sub-groups [36]. In our study, a first run of STRUCTURE distinguished three subpopulations according to geographical location and domestication history, while 189 were not assigned to a subpopulation. In this first step, all landraces (Spanish and Chinese) clustered together while a further analysis separated Chinese from non-Chinese landraces. This may indicate that Spanish landraces are from one or a few common ancestors. Previous research on the evolutionary history of peach has also indicated a high probability that the Spanish non-melting peaches evolved from northwest Chinese peaches [4]. Still in the first step, a clear divergence between Oriental and Occidental commercial subpopulations reveals the existence of different breeding sources of germplasm in Chinese and western countries. In the Chinese breeding program, cultivars from Japan, especially 'Hakuho', 'Hakuto' and 'Okubo', have had a great impact on Chinese peach breeding. All are selected from 'Chinese Cling' ('Shanghai Shui Mi'). After being introduced into China, 'Hakuho' was mainly grown in the south of China as a good quality, soft-melting honey peach, and 'Okubo' was used in the north as a parent for both peach and nectarine selections. Because of the kinship between 'Hakuho' and 'Okubo' (both from 'Hakuto'), it is not possible to distinguish the breeding cultivars by where they are grown. In Occidental breeding programs, a few accessions were intensively used as founders in early breeding programs, producing a dramatic reduction of variability. One of the founders reported in the literature is 'Chinese Cling', however next-generation sequencing of this cultivar have revealed the low prevalence of its genome in current western commercial cultivars [21].

Linkage disequilibrium
Linkage disequilibrium (LD) has been especially used for marker-trait association in whole genome studies in plants. Knowing the level of LD is crucial in the design of association mapping studies. We found that LD decays approximately 43% faster in Oriental (3.85 cM) than in Occidental subpopulations (5.50 cM), while no decay of LD with genetic distance was observed in the landraces subpopulation. These three subpopulations are composed of nested subpopulations, which could lead to miscalculation of the LD. LD analysis in the nested subpopulations showed high LD extensions in the Oriental-Yu Lu and Occidental-peaches subpopulations. In both, the proportion of inter-chromosome pairwise allele comparisons of LD was much higher than that observed in the other subpopulations (20-36% compared to 4-7%). The large LD in the Yu Lu subpopulation is due to the intense pedigree relationship: most originated from the Fenghua Honey Peach Institute and 18 out of the 34 were considered to be mutants or seedlings selected from Yu Lu. The larger LD extension in Occidental peaches seems to be the result of the remaining subpopulation stratification in the LD estimation. In the Occidental subpopulations (Nectarines and Peaches), LD decayed at 6.3 cM and 24.9 cM, respectively, contrasting with the 13-15 cM previously reported in [15], using practically the same set of Occidental accessions. These discrepancies could be due to the inclusion of a few Chinese accessions in our subpopulation but also to the remaining population stratification. LD intensity in the 'landrace' subpopulation was low and we did not observe decay with genetic distance. A similar report on 104 Chinese landraces also found low levels of LD in northwest China and middle and lower reaches of the Changjiang River subpopulations [28]. Extremely low levels of LD in landrace and wild accessions have also been observed in other species, such as wild French grape (2.7 cM compared to 16.8 cM for cultivated French grape) [25,26] and maize (1 kb in landraces, 2 kb in diverse inbred lines and 100 kb in commercial elite inbred lines) [24]. A rapid decline of LD was observed in a wild, strictly self-incompatible, cherry subpopulation compared to a cultivated sweet cherry [37]. The absence of LD, which may suggest that no "phylogeny" of accessions exists, has been found in 76 Arabidopsis thaliana lines tested with 163 SNPs [38]. The high genotypic variation, low LD and phenotypic variation indicate that, with more markers, landrace subpopulations could be an ideal group for further association mapping.
These results demonstrate that peach germplasm is potentially a valuable resource for association genetics. The accessions obtained in breeding programs come from a limited number of progenitors and, consequently, have a reduced level of variability. This means that the genetic variants responsible for the observed phenotypic traits are reduced and fixed, trimming down the presence of minor-effect alleles which are difficult to detect through association mapping. If new sources of alleles are needed, a population of landraces is available.
The level of LD in different genome regions is variable because of the selection and recombination rate. The significant difference in LD among different linkage groups provides more information to determine the marker density needed in association studies [38]. The knowledge of LD extension in, for example, linkage group 4, will be quite useful when choosing SSR or SNP markers to identify candidate genes and QTLs responsible for the synthesis of linalool and lactone, which contribute to the peach and nectarine volatile [39,40]. Flesh texture (melting/nonmelting) correlated with the endo-polygalacturonase gene could be another interesting trait in this linkage group [41]. The recent publication on the peach genome [21] will be of great help to explore diversity on the wholegenome scale.

Conclusions
By jointly analysing Occidental (European and North-American) and Chinese genotypic data, we were able to estimate the effect of using Chinese germplasm in Occidental breeding programs and vice versa. We demonstrate that Occidental elite lines could be a source of variability in Chinese breeding programs, but, as Chinese landraces have higher levels of genetic variability, they would have a greater effect on increasing genetic diversity when used in breeding programs in western countries. The unambiguous distinction between Oriental and Occidental subpopulations indicates that quite different genetic backgrounds were used as breeding materials in China, Japan, Europe and America. In general, LD decays faster in Oriental germplasm, so a higher density of markers should be used in association mapping, however previous knowledge of the LD in the population study will be always required. Landraces have low levels of LD, making them a good tool for fine mapping of traits through LD mapping.

Plant material and DNA extraction
A total of 434 Prunus accessions, 429 Prunus persica (L.) Batsch and five peach related species (one P. mira, three P. davidiana and one P. kansuensis) were chosen. The accessions studied were highly heterogeneous, covering different geographic regions (split as Oriental and Occidental) as well as different domestication history (wild, landraces and breeding). Among them, 353 accessions originated from China, 64 originated from Japan and Korea, including breeding cultivars and landraces, and twelve accessions were introduced from west countries. Most accessions were collected from five germplasm collections: the National Peach Germplasm repository at Jiangsu Academy of Agriculture Sciences (Southeast Region), Fenghua Honey Peach Research Institute (Southeast Region), Shanxi Academy of Agriculture Sciences (Central Region), Southwest University (Southwest Region) and the Zhengzhou Fruit Research Institute (Chinese Academy of Agriculture Sciences, Central Region). Some old local cultivars were obtained from the northwest of China as described in Additional file 6: Table S1.
Young terminal leaves were collected and frozen in liquid nitrogen. Total genomic DNA was extracted from 1 g frozen leaf tissue with a modified CTAB procedure, and the concentration quantified by ultraviolet spectrophotometer (Beckman Coulter DU800), then diluted to 20 ng/μl for PCR amplification.

SSR amplification
The 434 Oriental accessions and six additional Occidental accessions were amplified with 48 SSR primer pairs (Table 6), 45 previously used by Aranzana [15] and PTS1-SSR, EPPCU1775 and PceGA25, also highly polymorphic. All forward primers were fluorescently labelled and PCR amplified using an Eppendorf Mastercycler 5333/5331 thermocycler 114 (Gradient No. 5331-41264, Germany), with amplification reactions and temperature cycles according to the protocol used by [18]. Fluorescently labelled PCR fragments were separated by capillary electrophoresis in an ABI PRISM 3130 DNA Analyser (Applied Biosystems, Foster City, CA, USA).

Data collection Analysis of genetic diversity and population structure
The genotypic data obtained was added to the Occidental data matrix of 236 accessions previously genotyped at IRTA (Institut de Recerca i Tecnologia Agroalimentàries), using the data of six accessions as size control. The following parameters of variability were calculated with PowerMarker 3.25 [48]: number of observed alleles per locus (Ao), effective number of alleles (Ae), observed heterozygosity (Ho), expected heterozygosity (He), Wright's fixation index (f = 1-Ho/He) and power of discrimination for each locus (PD). These parameters were calculated for the whole sample and for small subsets, considering their geographic origin or breeding status. Based on the Nei and Li [49] genetic distance estimation method, a Neighbour-joining dendrogram (with 1,000 bootstraps) was constructed with TREECON 1.3b [50]. The tree was rooted using one wild related species 'Guang He Tao' (Prunus mira. Koehne.) as outgroup.

Number of alleles inference in different subsets
Rarefaction curves were drawn to compare variability in the different sample subsets of different sample size. For these, random groups of accessions of sample size from 2 to 657 were selected by bootstrap and the mean value of Ao and Chebyshev 95% confidence intervals were calculated.

Population structure
Data from 25 SSR markers were selected to study population structure using STRUCTURE v.2.0 software [51], adopting an admixture model and correlated alleles, with burn-in and MCMC 100,000 and 1,000,000 cycles respectively. Each locus used in STRUCTURE analysis was separated from one another by at least 15 cM. K values were set from one to ten with assumption populations, running ten independent repeats per K. The most likely number of subpopulations was calculated according to Evanno's method [29]. The average membership coefficient for each accession was calculated using CLUMPP [52,53]. Accessions were assigned to a subpopulation when the membership coefficients were Q ≥0.8. Results were plotted with DISTRUCT software [54]. A second level analysis (nested) of population structure with the same software and parameters was carried out on each of the subpopulations detected in the first STRUCTURE run.

Analysis of molecular variance (AMOVA)
The genetic variation within and among subpopulations of 469 Prunus accessions and pairwise Fst were measured by analysis of molecular variance (AMOVA) using Arlequin v3.5 software. The threshold for statistical significance was determined by running 1000 permutations [55]. Principal coordinate analysis of all inferred subpopulations based on genetic distance matrix was also carried out, using the GenAlEx 6.5 software [56].

Analysis of linkage disequilibrium
Linkage disequilibrium (LD) was calculated in all subpopulations obtained in the STRUCTURE analysis, including the three large subpopulations from the first structure step as well as the four small subpopulations obtained by nested STRUCTURE analysis. Alleles with frequencies lower than 5% were removed in case of rare alleles causing an effect of inflation on LD estimation and on P-value. The correlation coefficient r 2 between each pair of alleles among 48 loci was calculated with pairwise LD analysis in PowerMarker 3.25 software, considering unphased genotype data [48]. The exact test was implemented to identify whether the two loci were significantly correlated, using the method set as permutation with a convergence bound of 0.05.
For each subpopulation, intra-chromosome LD was plotted against distance (measured in cM and Kbp). The curve of the variation of r 2 values with physical distance was calculated using the average values for each of four subsets of an equal number of pair comparisons covering adjacent intervals over genetic distance. The threshold r 2 = 0.1 was used for decay distance estimation as suggested by [57]. Note: 1 : indicates annealing temperatures. 2 : indicates linkage group information. 3 : position of the SSR markers in the T × E linkage map. 4 : physical position of the SSR markers in the peach genome sequence v1.0 (http://www.rosaceae.org/peach/genome). 5 : fluorescent labeled dye added to the forward primer.