Using the proposed method, we can predict the segregation pattern of target traits in a segregating population based on genomic prediction models, genome-wide marker genotype data, and linkage map data. Based on the prediction of a segregation pattern, we can calculate the probability of obtaining genotypes with characteristics required for new cultivars. The prediction always includes uncertainty because of the limited number of samples in training data and the environmental variations masking true genotypic values of training samples. The degree of uncertainty differs among traits, depending on the heritability and the genetic system of the traits. The degree of uncertainty is also different among parental combinations, depending on the QTL and marker genotypes of parental cultivars. Using the MCMC sampling algorithm, we can estimate the degree of uncertainty by calculating a posterior distribution for the proportion of progenies with desired characteristics. This information is expected to be useful for breeders to choose a good parental combination that has high probability of generating offspring with desired characteristics.
Various statistical methods have been proposed for selecting cross combinations . Most, however, are methods for predicting the average potential of a progeny population (e.g., the potential of F1 hybrid lines generated from two inbred lines), and are not methods for predicting the segregation pattern in a progeny population. The range of segregation of target traits differs among cross combinations. Therefore, the average potential alone cannot be sufficient information for selecting parental combinations. The method proposed in this study, in contrast, enables us to predict the segregation pattern in a progeny population, and thereby provide more detailed information about the cross combinations. For example, with the segregation pattern prediction, breeders can select parental combinations expecting transgressive segregation in a segregated population. Moreover, breeders can postulate the necessary size of a segregated population to obtain superior progenies. The expected genetic gain can also be calculated from the segregation pattern prediction. Quantitative and objective information about crosses will provide breeders a reasonable means to select parental combinations. In fruit tree breeding, techniques of asexual propagation are commonly used. Therefore, a good F1 progeny becomes a cultivar directly using the asexual propagation. It is important to predict the range of segregation in an F1 progeny population rather than the average genetic potential of progenies in the population.
For this study, we applied the proposed method to the segregation of harvest time and fruit weight in an actual breeding population derived from the cross ‘Akiakari’ × ‘Taihaku’. Segregation patterns observed in 2010 and 2011 agreed well with the predicted segregation pattern, suggesting the potential of the proposed method for predicting the segregation of target traits in a progeny population. The degree of uncertainty of the predicted segregation was calculated as a posterior distribution of the proportion of progenies that fulfill the criteria. It was compared with the observed proportions. Consequently, the observed proportion of progenies that fulfill the criteria seems to follow the posterior distribution. In the breeding population, the posterior distribution of fruit weight showed a broader peak than that of harvest time, suggesting that the uncertainty of the predicted proportion was larger in fruit weight than in harvest time possibly because fruit weight has lower heritability than at harvest time. As this example showed, the degree of uncertainty will differ depending on traits and cross combinations. Therefore, it is important to provide information about the uncertainty of the prediction for each trait and each cross combination. Especially when the number of markers and the number of samples used for building a prediction model, the uncertainty of the prediction can be large and therefore should be considered when breeders select cross combinations. Because the proposed method was validated based on one breeding population, additional studies will be necessary to evaluate the potential of the method in other plant species as well as Japanese pear.
For this study, we used BayesA for building a prediction model and BEAGLE for estimating phased genotypes of parental cultivars and lines. Moreover, several alternative methods for conducting the equivalent calculation exist. For example, we used BayesB, which is a model assuming that most markers have no effects on genetic values, as well as BayesA in our previous study . In our previous study, BayesA performed better than BayesB in most traits, partly because of the low density of markers used in the prediction. BayesB is known to perform better when the linkage disequilibrium between QTL and markers is stronger . Therefore, the advantage of BayesB over BayesA might appear when the number of markers is sufficiently large to ensure strong linkage disequilibrium between QTL and markers. When linkage disequilibrium between QTL and markers is weak, random regression best linear unbiased prediction (RR-BLUP) is expected to yield better than either BayesA and BayesB [29–31]. Applying RR-BLUP to the present dataset revealed that RR-BLUP had lower accuracy than BayesA (data not shown). The result indicated that genotypic variations in the traits analyzed in this study could be explained by linkage equilibrium between markers and QTL as well as kinship relationships among cultivars. In fact, two markers showed significant association with variations in harvest time in a genome-wide association study using 76 Japanese pear cultivars , suggesting linkage disequilibrium between the significant markers and QTL.
Non-additive effects, i.e., dominance and epistasis, are also important for the selection of a good parental combination. Because of their importance, Lü et al.  proposed a method for the prediction of elite cross combinations by considering epistasis. In fruit tree breeding, breeders can exploit all genetic effects, i.e., additive and non-additive, as they are expressed in the phenotypes of individuals , because the superior individuals can be propagated by asexual means. In this study, we applied the genotype effect model , which can include both additive and dominance effects as “genotype effects” of markers. The prediction accuracy of the model was equivalent to that of the additive allelic model, suggesting that dominance effects were small in traits analyzed in this study. It is, however, also possible that the sample size is not sufficiently large to estimate numerous genotype effects for multi-allelic markers accurately. When the samples are few, it is difficult to model dominance and epistasis effects explicitly because the number of possible models is too large. In that case, nonlinear kernel methods might be a good alternative of models involving the nonlinear effects. For instance, reproducing kernel Hilbert space regression [32, 33] is a promising nonlinear kernel regression method . To estimate the phased genotypes of parental cultivars, we used BEAGLE in this study because most markers were multi-allelic. When markers are bi-allelic, other algorithms, such as fastPHASE  and MaCH , can be good alternatives to BEAGLE. Methods and algorithms used in the proposed method are currently advancing at a fast pace. Therefore, the advent of novel methods and algorithms will further improve the accuracy of the proposed method.
In this study, we used the Bayesian latent variable regression to estimate genetic effects of multiple markers. The method is useful for the analysis of data collected in breeding programs because field-testing data are often collected as ordinal categorical or binary data to save labor for measuring traits. The Bayesian method has been applied to the QTL analysis [37, 38] and genome-wide association studies [15, 26, 39] of binary and ordinal categorical traits. Iwata et al.  first applied the method to the genome-wide predictions of breeding values in ordinal traits in the context of genomic selection. In the present study, we extended the method further to the prediction of a segregation pattern in a progeny population. As described above, the Bayesian approach enables us to calculate the posterior distribution of the proportion of acceptable progenies via the MCMC sampling and to estimate the uncertainty of the prediction. When only the point estimate of the proportion is needed, fast algorithms proposed in earlier reports [37–39] will be useful especially when the markers are numerous. Although the Bayesian method is useful to estimate genetic effects of markers in ordinal categorical data, it is noteworthy that ordinal categorical scoring can lose information that is necessary to estimate small genetic effects as described in an earlier report .
In this study, we used the dataset of the 84 Japanese pear cultivars to demonstrate the potential of the proposed method. The cultivars in the data are few and insufficient to build an accurate prediction model when heritability of a target trait is low. In this study, the analyzed target traits are thought to be highly heritable. Especially for harvest time, Iwata et al.  detected two significant markers via a genome-wide association study using data of 76 Japanese pear cultivars, and found that the markers collocated with a known gene and a QTL detected in a bi-parental F1 population. To predict the segregation pattern of a target trait accurately, however, numerous cultivars will be necessary.
Marker density in the 84 Japanese pear data is insufficient to ensure strong linkage disequilibrium between QTL and markers. Although the range of linkage disequilibrium in a Japanese pear population extends to about 10 cM , lower marker density induces more frequent recombination between markers and QTL, and worsen the prediction accuracy of the segregation pattern because the prediction model assigns genotypic effects to markers instead of unobservable QTL.
Recently, whole-genome genotyping using genome-wide markers has become inexpensive and operationally straightforward with high-throughput [40–43]. This trend will drive the actual use of the proposed method in the breeding of various crop plants. For this study, we used only the trait phenotype and marker genotype data of parental cultivars to build prediction models. The accuracy of the prediction models is expected to increase through the use of the trait phenotype and marker genotype data of progenies in segregating breeding populations.