Multi-population genomic prediction using a multi-task Bayesian learning model

Background Genomic prediction in multiple populations can be viewed as a multi-task learning problem where tasks are to derive prediction equations for each population and multi-task learning property can be improved by sharing information across populations. The goal of this study was to develop a multi-task Bayesian learning model for multi-population genomic prediction with a strategy to effectively share information across populations. Simulation studies and real data from Holstein and Ayrshire dairy breeds with phenotypes on five milk production traits were used to evaluate the proposed multi-task Bayesian learning model and compare with a single-task model and a simple data pooling method. Results A multi-task Bayesian learning model was proposed for multi-population genomic prediction. Information was shared across populations through a common set of latent indicator variables while SNP effects were allowed to vary in different populations. Both simulation studies and real data analysis showed the effectiveness of the multi-task model in improving genomic prediction accuracy for the smaller Ayshire breed. Simulation studies suggested that the multi-task model was most effective when the number of QTL was small (n = 20), with an increase of accuracy by up to 0.09 when QTL effects were lowly correlated between two populations (ρ = 0.2), and up to 0.16 when QTL effects were highly correlated (ρ = 0.8). When QTL genotypes were included for training and validation, the improvements were 0.16 and 0.22, respectively, for scenarios of the low and high correlation of QTL effects between two populations. When the number of QTL was large (n = 200), improvement was small with a maximum of 0.02 when QTL genotypes were not included for genomic prediction. Reduction in accuracy was observed for the simple pooling method when the number of QTL was small and correlation of QTL effects between the two populations was low. For the real data, the multi-task model achieved an increase of accuracy between 0 and 0.07 in the Ayrshire validation set when 28,206 SNPs were used, while the simple data pooling method resulted in a reduction of accuracy for all traits except for protein percentage. When 246,668 SNPs were used, the accuracy achieved from the multi-task model increased by 0 to 0.03, while using the pooling method resulted in a reduction of accuracy by 0.01 to 0.09. In the Holstein population, the three methods had similar performance. Conclusions Results in this study suggest that the proposed multi-task Bayesian learning model for multi-population genomic prediction is effective and has the potential to improve the accuracy of genomic prediction.


Background
Genomic prediction has become a new tool for selection of candidates based on genomic estimated breeding values (GEBV) through the use of dense markers covering the whole genome [1]. To predict GEBV, a training data set with genotypes and phenotypes is used to derive the prediction equations, where all marker effects are estimated simultaneously. GEBV for selection candidates that have genotypes are then predicted by summing up all the marker effects. The accuracy of GEBV is affected by several factors [2,3], of which the number of individuals in the training data set and the marker density are of crucial importance [2,3].
In Holstein dairy cattle, genomic prediction has been successfully applied using the Illumina BovineSNP50 single nucleotide polymorphism (SNP) panel [4,5]. For smaller populations such as Ayrshire in dairy cattle, acquisition of a large number of animals to be included in the training data set for genomic prediction still remains a challenge. One strategy is to combine data of the small populations with data from other populations to increase the size of the training set. However, simply pooling data from different populations may result in unfavorable accuracies if the marker density is low or the populations have diverged for a long time [6][7][8].
Increasing the marker density is one of the possible solutions because the linkage disequilibrium (LD) phase persistence between markers and quantitative trait loci (QTL) among different populations would likely to improve. However, a recent study in Jersey and Holstein dairy cattle reported a very limited advantage by using a very high density SNP panel [9]. Few studies have been dedicated to research new methods or strategies other than simply pooling data together for genomic prediction. Brondum et al. [10] proposed an approach called BayesRS for multi-population genomic prediction, where a location specific genetic variance derived in one population were used as priors for another population. They found that for some traits, BayesRS might be advantageous compared to the approach of simply pooling training data sets for distantly-related populations; but for closely related populations the method did not perform better than simply pooling data together.
Multi-task learning, the term first coined by R Caruana [11], aims to improve learning performance by simultaneously learning from related tasks. In text and speech recognition, image reconstruction and many other areas where data are collected from multiple sources, multi-task learning has been successfully applied [12][13][14]. Recently, the multi-task learning has attracted a growing interest in biological science for sequence and gene expression analyses [15][16][17] as well as genome-wide association studies (GWAS) [18]. To our knowledge, the application of multi-task learning in genomic selection has not been reported so far.
Bayesian learning models using stochastic search variable selection (SSVS) has been widely used and proven effective for genomic prediction in a single population [1,[19][20][21]. Normally, SSVS uses some types of spike and slab distributions as priors for SNP effects. A latent indicator variable (0 or 1) is associated with each SNP, with 0 indicating that the SNP is irrelevant to the trait and is excluded from the model, and 1 indicating that the SNP is associated to the trait phenotype and is included in the model. In this study, it was assumed that multiple populations share the same set of latent indicator variables which can be learned jointly. The goal was to develop a multi-task Bayesian learning model for multi-population genomic prediction and to evaluate its performance on both simulated and real data.

Methods
In this section, single-task Bayesian learning model, the simple data pooling method, and the multi-task Bayesian learning model were introduced. A Gibbs sampling algorithm was designed to implement the multi-task Bayesian learning model. Monte Carlo simulation studies were conducted to evaluate the performance of the proposed multi-task model. Real data on five milk production traits from Holstein and Ayrshire dairy breeds were also used to test the effectiveness of the multi-task model.

Single-task bayesian learning model
In a single reference population of n animals with genotypes on m SNP markers, the statistical model can be written as: Where y i is the phenotypic value for the i th animal (i = 1,…,n), μ is the intercept, x ij is the genotype for the j th SNP locus (j = 1,…,m) of the i th animal, which is coded 0, 1 or 2, depending on the number of copies from a specified allele, a j is the regression coefficient for the j th SNP (allele substitution effect), and e i is the random residual error.
A flat prior distribution is assigned to μ. a j is assumed a mixture of a normal distribution N 0; σ 2 a À Á and a point mass density at zero (denoted by a Dirac delta function δ 0 (a j ) hereinafter). The weights for the two distributions are (1-w) and w, respectively, so that a j jw; σ 2 a À Á e 1−w ð Þ N 0; σ 2 a À Á þ wδ 0 a j À Á : w follows a uniform prior distribution. A latent indicator variable γ i is introduced for each SNP, so that when γ i =1 , a j eN 0; σ 2 a À Á , and when γ i =0 , a j =0. Prior distribution for each γ i is assumed i.i.d. and follows Bernoulli distribution with probability (1-w). So the joint prior density for γ is f γjw Residual errors are assumed from a multivariate normal distribution N 0; Iσ 2 e À Á . The prior distribution for σ 2 a σ 2 e À Á is a scaled inverse Chi-square distribution with degree of freedom v a (v e ) and scale factor s 2 a s 2 e À Á .

Simple data pooling method
Suppose animals are from a number of c different populations. In a simple data pooling method, animals from multiple populations are pooled together to form a single training data set. It is assumed that the population origin for each individual is known prior to the analysis. Population origin is included as a fixed effect. The effect of each SNP is assumed to be the same across populations.

Multi-task Bayesian learning model
For c populations with n k animals in the k-th population, the statistical model can be written as: x ijk a jk þ e ik i ¼ 1; ⋯; n k and k ¼ 1; ⋯; c ð Þ ; In matrix notation, this can be written as: where y ik is the phenotypic value for the i th animal in the k th population; μ k is the general mean of population k; x ijk is the genotype for the j th SNP locus of the i th animal in the k th population; a jk is the j th SNP effect in population k, and e ik is the random residual effect; and in the matrix notation, y k , a k , and e k are vectors of phenotypic values, SNP effects, and residual effects, respectively; 1 is a vector with all elements set to 1; and X k is the design matrix relating y k to a k . In the model, a jk allows the j th SNP effect to have a different value in population k. To share information among different populations, a common latent indicator variable indicating whether SNP j is associated with a QTL is used across populations. Accommodating these features into a Bayesian model produces the multi-task Bayesian learning model. The following prior distributions for the unknown parameters and hyper-parameters are assumed in the multi-task Bayesian learning model: The likelihood function of the whole data given all the parameters in the model is: So the joint posterior density function is:

Gibbs sampling algorithm
A Gibbs sampling algorithm was designed to draw samples for unknown (hyper-) parameters from their full conditional posterior distributions. To avoid reducibility of Markov chain, γ j and a jk are jointly sampled by first sampling γ j from f ðγ j jθ j − ; yÞ followed by sampling a jk from f ða jk jγ j ; θ j − ; yÞ; where θ j − represents all parameters except γ j and a jk . Full conditional posterior distributions for μ k ; w; σ 2 ak and σ 2 ek can be derived by picking up the relevant parts from the joint posterior distribution. Derivation for the density function f ðγ j jθ j − ; yÞ and sampling of γ j are described in the Appendix. The Gibbs sampling steps are described as below: Step 1. Initialize the parameters w; γ; σ 2 ak ; σ 2 ek ; μ k and a k : Step 2. For j=1,···, m a. Sample γ j from Bernoulli distribution with probability 1/(1+q j ), b. For k=1,···c, sample a jk from Step 3. Sample w from Beta distribution Step 4. For k=1,···, c, sample σ 2 ak from scaled inverse Chi-square distribution Step 5. For k=1,···, c, sample σ 2 ek from scaled inverse Chi-square distribution Step 6. For k=1,···, c, Repeat Step 2 to 6 until a set number of iterations are reached.
It can be shown in step 2 that information from all populations are used to generate the latent variable γ j , and the SNP effect a jk is generated for each population.

Computer program
Computer programs were written in C language to implement the multi-task Bayesian learning model, singletask Bayesian learning and the simple data pooling method. Programs and source codes are available upon request.

Monte Carlo simulations
The aim of the simulation was to evaluate the performance of the proposed multi-task Bayesian learning model and to compare with the single-task model and the simple data pooling method. Different scenarios were considered that differ in number of QTL affecting the trait, correlations of the QTL effects between different populations, and the density of SNP panels used for genomic prediction.
Real genotypes from 458 Ayrshire and 2,298 Holstein bulls were used for simulations. All Ayrshire animals and 690 Holstein animals were genotyped on the Illumina BovineHD BeadChip (800 k) SNP panel, and the remaining 1,608 Holstein animals were genotyped on the Illumina BovineSNP50 BeadChip (50 k) SNP panel. SNPs meeting one of the following criteria were excluded: minor allele frequency (MAF) lower than 0.05, missing genotype rate greater than 0.10, highly correlated with any other SNP genotype (95% genotypes from two loci identical or in complementary). SNP locations were determined against Bovine genome assembly UMD3.1 [22]. SNPs with unknown locations or on sex chromosomes were discarded. SNPs filtered in one breed were also removed from the other breed. After editing, 246,668 SNPs from the 800 k panel and 28,206 SNPs from the 50 k panel were kept for analyses.
For ease of computation, only SNPs from the first 10 chromosomes were used. Four scenarios were considered with combinations of QTL number being either 20 or 200 and the correlation of QTL effects between the two populations being low (ρ = 0.2) or high (ρ = 0.8). For each scenario, QTL were randomly sampled from the 50 k panel. For each QTL j, allele substitution effects in the two populations, α j1 and α j2 , were sampled from a bivariate normal distribution with mean 0 and variancecovariance structure where X j is the QTL genotypes coded as 0, 1 or 2, number of copies on an arbitrarily chosen allele. Total additive genetic variance were calculated within each breed as For each animal i, a residual effect e i , was sampled from a normal distribution with mean 0 and variance σ 2 e , so that phenotypic value of the animal y i =a i +e i . σ 2 e was equal to σ 2 a to give a trait with heritability of 0.5. Each scenario was replicated 10 times.
Four SNP panels of different density, the original 800 k panel, and the mimicked 400 k, 200 k, 100 k by selecting every 2nd, 4th, and 8th SNP, respectively, from the 800 k panel were used for genomic prediction. For each scenario described above, the simulated QTL were removed from the SNP panels. A special scenario was designed to keep all QTL genotypes in the 800 k panel for genomic prediction. The 50 k genotypes of 1,608 Holstein animals were imputed to 800 k using genotype imputation software FImpute developed by Sargolzaei et al. [23]. Imputation accuracy was evaluated on a set of 126 animals that have been genotyped on both the 50 k and 800 k panel, and the ratio of the genotypes that were correctly imputed was 0.9930.
For training and validation purposes, 393 Ayrshire and 2,084 Holstein animals born before 2004 were used as training set, 65 Ayrshire and 214 Holstein animals born in and after 2004 were used for validation. The number of animals used for genomic prediction is shown in Table 1. The simulated phenotypic values for training animals were used to derive SNP effects using different models. Degree of freedom of the inverse chi-square distributions for variances of SNP effects and residual effects were set to 4 and 10, respectively. The scale parameter S 2 a was derived from the expected value of a a is the true additive genetic variance. S 2 e was derived similarly. The Gibbs chain was run for 50,000 cycles with the first 10,000 discarded as burn-in. Marker effects were estimated as averages from all post burn-in samples. Genomic breeding values for animals in the validation set were estimated as the sum of population mean and all marker effects. Accuracy was evaluated as Pearson's correlation coefficient between genomic estimated breeding values and true breeding values for validation animals, i.e. r (GEBV, TBV). Regression of true breeding value on genomic estimated breeding values, b(TBV, GEBV), was also calculated to evaluate the bias of the genomic estimated breeding values.

Real data
Five milk production traits including milk yield, fat yield, protein yield, fat percentage and protein percentage were used for the same set of animals in the simulation study. Bull proofs (estimated breeding values from progeny testing, EBV) from April 2008 were used as phenotypes for training animals, and bull proofs from December 2011 were used for validation animals. All proofs were provided by Canadian Dairy Network (CDN) and had reliability above 0.65.
Two SNP panels with 28,206 SNPs from the 50 k panel and 246,668 SNPs from the 800 k panel were used for genomic prediction. Degree of freedom of the inverse chi-square distributions for variances of SNP effects and residual effects were set to 4 and 10, respectively. Scale parameters for the two distributions were derived in the same way as described in the simulation study, but instead of using true additive genetic variance and residual variance, estimated variances were used. Estimated additive genetic variances and residual variance were obtained from ASReml [24] by fitting an animal model with a population mean, animal effect, and random residual effect. The pedigree contained 15,731 animals for the Holstein population and 4,926 animals for Ayrshire. For 28,206 SNPs, the Gibbs sampling procedure was run for 50,000 iterations with the first 10,000 discarded as burn-in; and for 246,668 SNPs, the Gibbs sampling procedure was run for 100,000 iterations with the first 50,000 discarded as burn-in. Burn-in period was determined by visually inspecting the Gibbs chain. All samples after the burn-in period were kept. SNP effects were estimated by averaging all samples after the burnin period. After the estimation of SNP effects, the GEBV was calculated for animals in the validation set by summing up the population mean and all the SNP effects. Accuracy was measured as Pearson's correlation coefficient between GEBV and the 2011 bull proofs for validation animals. Table 2 shows the accuracy of genomic prediction with simulated 20 QTL. In the Ayrshire validation set, multitask Bayesian learning model performed the best among the three methods within each SNP panel used under the scenario with either a low (ρ = 0.2) or high (ρ = 0.8) correlation of simulated QTL effects between Ayrshire and Holstein populations. The greatest increase of accuracy was observed when ρ was 0.8 and when density of the SNP panel was the highest (accuracy increased from 0.67 for the single-task method to 0.83 for the multi-task method). Simply pooling data together substantially reduced the prediction accuracy in Ayshire when ρ was 0.2. The greatest reduction of accuracy was observed when ρ was 0.2 and when density of the SNP panel was the highest (accuracy decreased from 0.71 for the single-task method to 0.56 for the pooling method). When ρ was 0.8, the pooling method had an increased accuracy in Ayrshire compared with the single-task method. The pooling method produced substantially lower accuracy than the multi-task model in the Ayrshire validation set, especially when QTL effects were lower correlated between the two populations. In the Holstein validation set, the multi-task model performed similar or slightly better than the single-task model. The pooling method had a slightly reduction of accuracy when ρ was 0.2, and had a similar accuracy compared with the singletask method when ρ was 0.8. Within each method, increasing density of the SNP panel generally improved prediction accuracy in both the Ayrshire and Holstein populations. Table 2 also shows the slopes of regression of true breeding values on the GEBV. Regression coefficients of true breeding values on GEBV were less than one for the pooling method indicating that the GEBV were inflated. Table 3 shows the accuracy of genomic prediction with simulated 200 QTL. In the Ayrshire validation set, the multi-task model had slightly higher prediction accuracy within each SNP panel compared with the single-task model under either scenario with low (ρ =0.2) or high (ρ =0.8) correlated QTL effects between the Ayrshire and Holstein populations. Pooling method performed similar or slightly worse compared with single-task model when ρ was 0.2, and generally performed better when ρ was 0.8. The pooling method also performed slightly better than the multi-task model when ρ was 0.8 and when density of the SNP panel was relatively high. When ρ was 0.2, regression coefficients of true breeding values on GEBV were less than one for the pooling method indicating that the GEBV were inflated. The three methods performed similar in the Holstein validation set. Overall, the accuracy was lower compared with scenarios when only 20 QTL were simulated regardless of the methods used.

Monte Carlo simulation study
To evaluate the performance of the three methods under situations where all QTL genotypes can be acquired and included for genomic prediction, Table 4 shows the accuracy of genomic prediction when QTL genotypes are included together with SNP marker genotypes for training and validation. When 20 QTL were simulated, using multi-task model improved the accuracy by 0.16 and 0.22 in the Ayrshire validation set for scenarios where correlation of QTL effects between the Ayrshire and Holstein populations was 0.2 and 0.8, respectively. Using the pooling method reduced the accuracy in Ayrshire by 0.12 when ρ was 0.2, but increased the accuracy by 0.17 when ρ was 0.8. When 200 QTL were simulated, using the multi-task model increased the prediction accuracy by 0.03 and 0.07, respectively, for ρ equal to 0.2 and 0.8. The pooling method reduced the accuracy by 0.04 when ρ was 0.2, and increased the accuracy by 0.11 when ρ was 0.8. The pooling method outperformed the multi-task model only for the scenario where 200 QTL were simulated with their effects highly correlated between the two populations. In the Holstein validation set, the multi-task model had similar or slightly higher accuracy compared with the single-task model. The pooling method had similar accuracy as the single-task model when ρ was 0.8. When ρ was 0.2, the pooling method had slightly lower accuracy compared with the single-task model. Regression coefficients of true breeding values on GEBV were less than one for the pooling method except for the scenario of 200 simulated QTL with effects highly correlated between the two populations, indicating that the GEBV predicted by the pooling method were inflated.
Real data analysis Table 5 shows the accuracy of genomic prediction for milk production traits using real data. For the Ayrshire validation set, the multi-task model increased the accuracy by up to 0.07 compared with the single-task model, while the simple data pooling method resulted in reduced accuracy in general. The greatest increase in accuracy using

Discussion
Traditionally, genomic prediction with data from multiple populations were implemented either by running genomic prediction within each population (single-task) or by simply pooling data together. Single-task genomic prediction cannot utilize information from other populations and therefore, the accuracy of genomic prediction is largely determined by the size of training data set [2,3]. For breeds with only a small number of animals having both DNA marker genotypes and phenotype data, the accuracy of genomic prediction can be low [25]. Combining the data with other breed populations has the potential to improve the prediction accuracy. It is however, difficult to effectively account for the differences of SNP effects among different populations by simply pooling data together. If the marker density is low or the populations are divergent from each other, simply pooling data together may result in unfavorable prediction accuracies [6]. The multi-task Bayesian learning model proposed in this study uses information from all populations simultaneously while allowing the SNP effects to vary in different populations. Different populations share information through a common set of latent indicator variables. When the target trait has a similar genetic background in related populations, it is reasonable to assume that some shared QTL affecting a common trait in different populations.
However, the linkage disequilibrium phase between SNP markers and QTL are likely to be inconsistent, especially when the marker density is low. Therefore, the multi-task Bayesian learning model is more flexible about the SNP effects and is likely to have better performance than a simple data pooling method. Results from simulation studies support the use of multi-task Bayesian model for multi-population genomic prediction especially when there are a few QTL affecting the trait. For the scenario where a few QTL affect the trait, the increase of accuracy by using multi-task model was greater when QTL effects had a higher correlation between two populations. The accuracy was further increased when QTL genotypes were included for training and validation. These results are expected as the higher the correlation of QTL effects between two populations, the more informative information the two populations can share. Including QTL genotypes also increased the amount of information to be shared between the two populations. Results suggest that the proposed multi-task Bayesian learning model is effective in combining information from multi-populations to improve accuracy of genomic prediction.
Results from simulation also showed that simply pooling data together may reduce the accuracy of genomic prediction if QTL effects were lowly correlated between two populations or if a relatively low density SNP panel was used. When QTL effects had a high correlation between two populations and the SNP panel was high, the pooling method increased the accuracy compared with single-task model. When the correlation of QTL effects between two populations was high, the pooling method was inferior to the multi-task model if the number of QTL was small, but This assumption will be violated if the correlation of QTL effects is low between two populations and may result in poor performance of the pooling method. The multi-task model proposes to share information through the same latent indicator variables. Correlations of QTL effects among different populations are not explicitly used and therefore, information across populations may not be fully utilized if such correlations are high. This might be why the pooling method still outperformed the multi-task model when the number of QTL is large and the correlation of QTL effects between two populations was high.
Results from real data analyses in this study showed that the multi-task Bayesian learning model produced a similar or higher accuracy compared to the single-task model, and that simply pooling data together resulted in a reduced accuracy when the marker density was low. SNP effects could be different across populations, especially when lower density markers were used [6]. These results are in agreement with results from simulation studies.
Gains of accuracy by using the multi-task model were higher for fat percentage and protein percentage traits than for other traits. This is likely due to that large QTL or genes such as DGAT1, have larger influence on the percentage traits than on the yield traits [26,27]. These results agreed with simulation studies which showed that the multi-task model performed better when there are fewer QTL affecting the trait.
In this study, only two dairy cattle breeds were considered, and the multi-task Bayesian learning model was shown to be effective and more beneficial to the population with a smaller data size. For the larger population, using multi-task model did not produce much improvement. The little improvement in the larger population could be due to that the smaller population is too small to be able to have a significant impact on the larger population. In practice, there are situations where many populations are to be combined for analysis with each one contributing a small amount of data, a typical example as in beef cattle production. It would be interesting to test the performance of the multi-task model under such scenarios.
The current multi-task model considered additive genetic effects only. Non-additive genetic effects, which have gained growing interests with availability of genomics information [28][29][30], can also be accommodated into the multi-task model. A similar strategy used in this study to sharing information across populations for additive genetic effects may also be applied to non-additive genetic effects. Inclusion of non-additive genetic effects in the multi-task model warrants further investigation. The proposed multi-task Bayesian learning model used a spike and slab mixture distribution to conduct variable selection and shrinkage for SNP effects. This prior setting is similar to that in the BayesCπ method proposed by Habier et al. [21]. Other types of mixture distributions currently being used for genomic prediction [1,19,20], can also be adapted to the multi-task model. The strategy used in the multi-task Bayesian learning model allows different populations to share information through a common latent variable assumed for each SNP. Such strategy has been shown effective in both simulation and real data studies; however, other strategies may also be exploited, for example, by modeling the joint distribution of the SNP effects among different populations. Further investigations are required to evaluate alternative strategies for sharing information across populations.
In this study, the Bayesian models were implemented via a Gibbs sampling algorithm adopting the residual-update computing strategy proposed by Legarra and Misztal [31]. Computing time for this algorithm is proportional to the number of animals and number of SNPs in the model. For the training sample size of 2,477 animals and genotypes of 246,668 SNPs, the proposed multi-task model requires 44 CPU hours to complete 150,000 Gibbs sampling cycles on a Linux cluster system with Intel X5675 3.07GHz CPU. With increased training sample size and increasing marker density to a very high density panel or even sequence data, computing burdens would become a concern. A recent study [32] has improved the residual-update algorithm resulting in the CPU time reduced by 35.3 to 43.3%. The authors in the same study [32] also proposed an alternative algorithm which reduced CPU time by 74.5 to 93.0%. Approximation algorithms based on expectationmaximization (EM) [33][34][35] and variational Bayes [36] have also been proposed to replace the time consuming Markov chain Monte Carlo (MCMC) sampling based algorithms. Adapting these algorithms to accommodate the multi-task Bayesian learning model would be of great interest.

Conclusions
A multi-task Bayesian learning model was proposed for multi-population genomic prediction. The multi-task model shares information across populations through a common set of latent indicator variables while allowing the SNP effects to vary in different populations. Simulation studies and real data analysis suggest that the proposed multi-task Bayesian learning model is effective and beneficial to populations where a small number of training animals are available. Accuracy of genomic prediction in small populations can be improved by using the multi-task model especially for traits affected by a few QTL with large effects. Similarly, Next, the conditional likelihoods f y k r j ¼ 0; θ j − k Þ À and f y k r j ¼ 1; θ j − k Þ À will be derived. Suppose one is at the (l+1) th Gibbs sampling iteration, and wants to sample γ j and a jk , the linear regression model given all parameters except γ j and a jk can be written as: Where y Ã k ¼ y k −μ k −X 1k:jk−1â lþ1 1k:jk−1 −X jkþ1:mkâ l jkþ1:mk : Given the priors that ða jk jγ j Þ e γ j N 0; σ 2 ak À Á þ 1−γ j δ 0 a jk À Á and ðe k jσ 2 ek Þ e N 0; Iσ 2 ek À Á ; the conditional likelihood of y k can be written as: ek ðA:3Þ And f y k jr j ¼ 1;

ðA:4Þ
Where V y Ã k ¼ x jk x 0 jk σ 2 ak þ Iσ 2 ek is the (co-) variance matrix of y Ã k given that r j =1. It can be easily verify that V −1 y Ã is: Substituting A.5 and A.6 into A.4, one has Finally, γ j can be drawn from a Bernoulli distribution with probability 1/(1+q j ).