Effect of family relatedness on characteristics of estimated IBD probabilities in relation to precision of QTL estimates
© Freyer et al; licensee BioMed Central Ltd. 2010
Received: 20 April 2010
Accepted: 26 September 2010
Published: 26 September 2010
A random QTL effects model uses a function of probabilities that two alleles in the same or in different animals at a particular genomic position are identical by descent (IBD). Estimates of such IBD probabilities and therefore, modeling and estimating QTL variances, depend on marker polymorphism, strength of linkage and linkage disequilibrium of markers and QTL, and the relatedness of animals in the pedigree. The effect of relatedness of animals in a pedigree on IBD probabilities and their characteristics was examined in a simulation study.
The study based on nine multi-generational family structures, similar to a pedigree structure of a real dairy population, distinguished by an increased level of inbreeding from zero to 28% across the studied population. Highest inbreeding level in the pedigree, connected with highest relatedness, was accompanied by highest IBD probabilities of two alleles at the same locus, and by lower relative variation coefficients. Profiles of correlation coefficients of IBD probabilities along the marked chromosomal segment with those at the true QTL position were steepest when the inbreeding coefficient in the pedigree was highest. Precision of estimated QTL location increased with increasing inbreeding and pedigree relatedness. A method to assess the optimum level of inbreeding for QTL detection is proposed, depending on population parameters.
An increased overall relationship in a QTL mapping design has positive effects on precision of QTL position estimates. But the relationship of inbreeding level and the capacity for QTL detection depending on the recombination rate of QTL and adjacent informative marker is not linear.
Studies on quantitative trait loci (QTL) in dairy cattle are performed almost exclusively on data from commercial populations. Setting up experimental populations is highly expensive and time consuming. Therefore, the simplest and most popular design for QTL mapping in dairy cattle was the granddaughter design (GDD, ). Single grandsires establish their "own families" with a number of sons (sires) genotyped for a marker panel, involving phenotypic information on the quantitative trait, based on several hundreds of cows ((grand)daughters).
The methodology to detect QTL in general pedigrees exploiting polymorphism of genetic markers was proposed by Fernando et al. (1989), based on a model where both the allelic QTL effects and the polygenic component are assumed to be random normal deviates . The covariance between individuals for a putative QTL is modeled by the probabilities of sharing alleles identical by descent (IBD), based on linked marker genotypes. Such IBD scores are important prerequisites in a two-step procedure to compute variance components using ASREML [3, 4]. The major advantage of the variance components approach is the ability to account for relationships among individuals in different families. Pong-Wong et al. (2001) proposed a fast deterministic approach to estimate IBD probabilities by combining the methods of Wang et al. (1995) and Knott and Haley (1998) [5–7]. Consequently, managing inbreeding loops with minimal information loss became feasible. This is of interest, since pedigree patterns harbouring inbred individuals do occur in many animal species, even if inbreeding should be avoided under commercial breeding conditions.
A recent, to our knowledge the first, study has shown that using inbred sires in a pedigree positively exerts QTL detection . However, how this applies is not straightforward, as there was neither different phenotypic variance, nor different (poly)genetic variance between the family structures in the simulation study cited. Sensitivity to environmental changes increases in inbred individuals due to loss of heterozygosity accompanied by impaired ability to react to changing or sub-optimal environmental conditions . Most likely, such inbreeding-specific environmental effects do occur in dairy cattle as well. Moreover, respecting negative inbreeding effects on fertility, health and also on economically important traits would be relevant . But the molecular genetic basis of the inbreeding depression phenomenon is still being examined in model species . Therefore, a realistic basis to include these effects in a simulation model is missing.
The only source of inbreeding effects on QTL position estimates in the recent study was the IBD probability. Investigating IBD parameters in this sense remained an open task. The objective of the simulation study summarized in this paper was to provide more insight into the characteristics of IBD probability in relation to the inbreeding level. Therefore, IBD parameters were examined in pedigrees with four generations and stepwise increasing inbreeding level and slightly varying marker panels for QTL mapping. Moreover, an attempt was made to target the theoretical "optimum inbreeding level" for QTL mapping. An extremely high inbreeding level was necessarily considered in order to evaluate the resulting IBD parameters regarding the theoretical optimum inbreeding level.
Results and discussion
Simulation parameters in terms of marker map, number of marker alleles and information content did not affect IBD probability significantly (P-values ranged from 0.75 to 0.87). Family structure, characterized by its inbreeding level, had a significant effect on the IBD probability (P < 0.0001). The IBD parameters were contemplated more detailed in order to identify the reasons for more successfully estimated QTL positions obtained by analysing stronger inbred family structures.
Means and standard deviations of IBD probabilities
Mean ± standard deviation of IBD probability for all family structures at 0 cM and at the true QTL position for marker maps M1, M2, M3 and M4 and six marker alleles
at 0 cM being the most distant position to the true QTL position:
0.0153 ± 0.0099
0.0155 ± 0.00994
0.0153 ± 0.00987
0.0133 ± 0.0099
0.0175 ± 0.0116
0.0176 ± 0.0116
0.0175 ± 0.0116
0.0172 ± 0.0116
0.0192 ± 0.0124
0.0193 ± 0.0116
0.0192 ± 0.0124
0.0187 ± 0.0124
0.0223 ± 0.0164
0.0224 ± 0.0166
0.0198 ± 0.0126
0.0206 ± 0.0133
0.0205 ± 0.0128
0.0140 ± 0.0112
0.0152 ± 0.0113
0.0252 ± 0.0165
0.0198 ± 0.0133
0.0212 ± 0.0139
0.0208 ± 0.0135
0.0227 ± 0.0154
0.0185 ± 0.0136
0.0196 ± 0.0132
0.0181 ± 0.0129
0.0188 ± 0.0089
0.0407 ± 0.0283
0.0367 ± 0.0220
0.0328 ± 0.0186
0.0379 ± 0.0230
0.30261 ± 0.0648
0.2785 ± 0.0768
0.3485 ± 0.1015
0.3081 ± 0.1154
at the true QTL position:
0.0128 ± 0.0106
0.0127 ± 0.0095
0.0127 ± 0.0104
0.0127 ± 0.0104
0.0170 ± 0.0127
0.0176 ± 0.0127
0.0170 ± 0.0122
0.0170 ± 0.0122
0.0192 ± 0.0136
0.0191 ± 0.0122
0.0189 ± 0.0132
0.0189 ± 0.0132
0.0223 ± 0.0177
0.0224 ± 0.0174
0.0191 ± 0.0169
0.0182 ± 0.0140
0.0222 ± 0.0156
0.0142 ± 0.0101
0.0149 ± 0.0126
0.0210 ± 0.0142
0.0218 ± 0.0159
0.0219 ± 0.0155
0.0200 ± 0.0165
0.0234 ± 0.0151
0.0214 ± 0.0154
0.0218 ± 0.0137
0.0213 ± 0.0143
0.0179 ± 0.0137
0.0414 ± 0.0277
0.0369 ± 0.0234
0.0287 ± 0.0227
0.0410 ± 0.0286
0.2881 ± 0.0919
0.2853 ± 0.0884
0.3392 ± 0.0993
0.3256 ± 0.0994
Average relationship coefficient among various individuals in the pedigree for different family structures
average relationship coefficient among animals
all sires and offspring
GGS1 and his offspring
GGS2 and his offspring
GGS1 and offspring of GGS2
GGS2 and offspring of GGS1
IBD parameters and profiles
QTL position estimates and pedigree relatedness
Except for FS5, the frequency of correctly estimated QTL positions (within an interval of 41.5 ± 1.5 cM on the marked chromosome) increased significantly with stepwise increasing inbreeding level (Figure 1). Most outliers resulted from analysing FS0. FS99 yielded the best results in terms of most correct QTL position estimates and least deviations from the actual QTL position.
Parallel runs via GridQTL mimicking a combined LD/LA- analysis yielded the same QTL position estimates as from the linkage analyses above, confirming robustness of QTL position estimates. When 100 sires and 100 dams were set up for a historic population 100 generations back, then the shape of the test statistic profile was similar to those obtained from the linkage analyses as described. The peak of the test statistic profiles became much sharper when choosing only two sires in the historic population (test statistic profiles not shown). This is another indication of the impact of a historically stronger related, and most likely more inbred, background when population history started with only two sires. In all cases, exactly the same situation in the pedigrees of FS0 to FS99 was given, whether 100 sires or two sires were chosen for a historical population. Hoffmann et al. (2000) stated that an older population with "reduced founder haplotypes by recombination" is more suited for fine mapping . Subsequent generations of inbreeding as in FS99 could be advantageous in this sense as well. Thus, our results support the conclusions from analysing human pedigrees.
Approaching the optimum inbreeding level
Inbreeding coefficient Fx , allelic frequency (m), number of generations (t) at maximum cov(IBS, IBD) for various effective population sizes (N) and for recombination rate c = 0.01 of QTL and adjacent marker
Our results are not considered to encourage inbreeding for practical breeding. Inbreeding depression effects have to be avoided. But, a capacity for exploiting inbreeding for QTL study designs is still available . However, as in each successful QTL analysis, the prerequisite is a QTL actually segregating in the pedigree to be studied. It should be mentioned, that estimation of breeding values, based on a relationship matrix incorporating pedigree information and genomic information, is still a topic in the literature, even with respect to the advanced dense SNP technology for genomic selection [ and ].
In this study, cov(IBS, IBD) is based on one marker only. We used the recombination rate of the nearest informative marker and QTL for calculating cov(IBS, IBD). The focus was on a practical pedigree as we can find in conventional dairy cattle breeding. The method can be extended to multiple markers. Further, a more general conclusion could be drawn by simulating pedigrees with random mating of diploid organisms with discrete generations and stepwise evaluating QTL estimates. Using a defined population history, such design could reveal an even higher average level of inbreeding (co-ancestry) than assumed with the two founder sires in our study.
Our simulation study carried out with respect to realistic conditions in dairy cattle revealed intrinsic relationships between precision of estimated QTL positions and pedigree relatedness in the mapping population. IBD parameters obtained from analysing family structures with varying inbreeding load yielded conclusive results with respect to the meaning of inbreeding for QTL estimation and its dependence on relatedness. Related pedigrees are necessary for linkage analyses, and the stronger the relatedness is, the greater is the success of such studies. Comparing two versions of historic populations used in a GridQTL analysis that mimics a combined LD/LA- analysis additionally underlined the advantage of inbreeding and increased relatedness. This leads us to the assumption that linkage disequilibrium of markers and QTL across several generations could easier be detected than in non inbred or "less related" pedigrees. It must be noted that the relationship of the capacity for QTL detection (here, expressed by cov(IBS, IBD)) and the average inbreeding level of a population is not linear. Finally, these results apply to the situation of one biallelic QTL actually segregating in the pedigree, marked by a defined chromosomal segment.
Characteristics of the inbred family structures (FS), average inbreeding coefficients (Fx) of sires of final offspring (SOF) and of the whole pedigree (Fx total)
S20 originated from an aunt-nephew-mating with Fx = 0.0625
S20 originated from mating half-sibs with Fx = 0.125
based on the same structure as FS3, but it contained three inbred sires, Fx = 0.125 each, and offspring number of one sire (S20) in- creased to 138, while simultaneously reducing offspring number of GSS1 by 60
based on the same structure as FS3, but the offspring number of S20 increased to 138, while simultaneously reducing final off- spring number of GGS1 by 60
extension of FS3 by one additional strongly inbred sire from a full- sib mating, where one sib was already inbred, Fx = 0.375
contained two sires originating from a mother-son mating with Fx = 0.375, and a sire from a half sib mating Fx = 0.125, pedigrees of GGS1 and GGS2 remained fully separated from each other (this missing link was a remarkable deviation from all other FS)
contains all sires inbred, Fx ranges from 0.063 to 0.375
an extremely inbred design already starting with inbred grand sires, Fx of sires ranges from 0.250 to 0.426
Overview of simulation parameters and symbols as used in the manuscript
Sets of simulation parameters in detail (code in brackets)
Four different sets of marker map:
Marker positions were given by marker maps in four versions (M1, M2, M3, M4), where marker position slightly varied (in marker distances) on the 55 cM long chromosomal segment
(M1) markers at 0, 13.7, 32.8, 35.7, 40.5, 42.5, 43.5, 44.5, 45.5, 48.5 and 53.2 cM
(M2) markers at 0, 10.9, 17.8, 25.9, 33,6 39.8, 46.3, 48.3, 49.3, 51.2 and 54.4 cM
(M3) markers at 0, 13.7, 32.8, 35.5, 37.5, 39.3, 40.3 42.4, 43.3, 44.3 and 48.4 cM
(M4) markers at 0, 13.7, 32.8, 35.7, 37.7, 39.7, 43.5, 44.5, 45.5, 46.5 and 49.4 cM
Bold script marks the marker bracket harbouring the QTL at 41.5 cM in each map
Marker information (three different sets regarding the number of marker alleles):
(2_A) two marker alleles
(4_A) four marker alleles
(6_A) and six marker alleles
Number of analyses (= repetitions per family structure):
4 versions of marker maps by 3 versions of marker allele numbers by 5 variations individual missing values (in marker information and/or phenotypic values, combinations of 20% randomly missing values each) = 60 data sets per family structure (60 independent analyses per family structure, with all parameters equally distributed)
A random deviate e was normally distributed with mean zero and variance σres2 (i.e. 60 percent of the total variance). Recombination events were simulated on the basis of a binomial map function. Trait values and marker genotypes were simulated in an identical manner for all family structures, applying the PEDSIM approach .
Sixty datasets were simulated for each family structure, based on variations in three simulation parameters (Table 5): (i) marker positions on the chromosomal segment (according to marker maps M1, M2, M3 and M4), (ii) marker allele numbers (2 marker alleles, 4 marker alleles and 6 marker alleles), and (iii) five versions of individual information content in terms of missing value distribution. The 11 micro satellite markers were unevenly distributed as in a situation of real QTL- mapping, preceding the fast developing SNP technology and genomic selection. The pre-fine-mapping QTL study covered a 55 cM chromosomal segment, which was expected to harbor one QTL. The focus of this study was clearly on detecting inbreeding effects on parameters of the IBD probability in consideration with estimated QTL map position. Thus, principal conclusions on them do not depend on the kind of molecular markers, numbers of markers or marker alleles. The 60 data sets per family structure were repetitions. Using this term is comparable to successive health data collections, in the same patients and their families, at different times of life, in different clinics, or treatments simultaneously affecting all families at a time in the same way. Statistical parameters were calculated by using SAS package, version 9.1 (SAS Institute, Inc., Cary, NC), and effects of simulation parameters on IBD scores were tested with proc GLM.
Calculating IBD probabilities and QTL analysis
The QTL effect was assumed random, with co-variance structure between individuals being a function of IBD probabilities at a particular location. A Fortran 90 program for calculating IBD probabilities was written that enables exploiting as much available information on pedigrees and markers as possible . The kernel of the program package was the rapid deterministic recursive algorithm for calculating IBD probabilities between each pair of gametes , followed by transmission of marker alleles from parents to offspring . Further, a method by Knott and Haley (1998) was implemented to determine IBD probabilities among (full) sibs' gametes in the second generation .
where y is an (n 1 × 1) vector of phenotypes. n 1 refers to the number of animals with phenotypes, and n 0 is the total number of animals in the pedigree. X is an (n 1 × s) design matrix of a number of fixed effects (s), Z is an (n 1 × n 0 ) incidence matrix relating animals to their phenotypes, H p is a (n 1 × 2n 0 ) incidence matrix relating animals to paternal and maternal QTL alleles at position p, β is an (s × 1) vector of fixed effects, u is an (n 0 × 1) vector of random polygenic effects, a p is an (2n 0 × 1) vector of the effect of a QTL at position p, and e is an (n 1 × 1) residual vector with expectation and covariance matrix (0, E ⊗ I), where E is the unknown (co-) variance matrix of the residual effects and I denotes the identity matrix. X is equal to 1, since all phenotypes were assumed pre-adjusted for non-genetic effects, and thus s = 1 and β = μ. The random polygenic effects u, and QTL effect a p , are assumed to follow a normal distribution with mean zero and variances A σ2 u and G p σ2 p , respectively. Matrix A is the additive relationship matrix. Matrix G p contains IBD probabilities at position p, obtained as described above. The model was fitted for each single position p (in steps of 1 cM) on the chromosomal segment. The data were analyzed using a random model variance component approach. The residual maximum likelihood (REML) procedure implemented in the ASReml software  was used to maximize the likelihood under both H0 and HA given the parameters for computing the likelihood ratio LR = -2(lnLH0-lnLHA) ~χ2df, to be calculated at each position p to find the most likely QTL position, with lnLH0 the logarithm of the likelihood computed for the pure polygenic model, and ln LHA the logarithm of the likelihood from the QTL model.
Parallel analyses assuming combined linkage disequilibrium and linkage
The QTL estimates obtained by linkage analysis (LA) as described above were compared by results of independent analyses using GridQTL. Thereby, the R- method that is based on a regression model was adapted . The advantage of this method is that it only requires genotypes instead of haplotypes to establish the "historical generation" . Here, 100 historical generations back to the defined pedigree design were chosen to mimic linkage disequilibrium (LD). Two extreme versions (two and hundred sires for mating to 100 cows each) characterized the effective population size of the historical generation. This step enabled analysing the data in terms of combined LD/LA [20, 21].
Covariance of IBS and IBD and inbreeding level
where t denotes generation number and X non-ibd (i.e. F = 1-X). Here, the recombination rate c was taken for the nearest informative marker and the QTL, depending on the marker distances in each marker map (Table 5). We used cov(IBS, IBD) to determine the optimum level of inbreeding for QTL detection in the examined population.
We thank Christian Stricker and Matthias Schelling for the opportunity to use PEDSIM and for fruitful discussions on QTL related topics.
- Weller JI, Kashi Y, Soller M: Power of daughter and granddaughter designs for determining linkage between marker loci and quantitative trait loci in dairy cattle. Journal of Dairy Science. 1990, 73: 2525-2537. 10.3168/jds.S0022-0302(90)78938-2.View ArticlePubMedGoogle Scholar
- Fernando RL, Grossman M: Marker assisted selection using best linear unbiased prediction. Genetics Selection Evolution. 1989, 21: 467-477. 10.1186/1297-9686-21-4-467.View ArticleGoogle Scholar
- George AW, Visscher PM, Haley CS: Mapping quantitative trait loci in complex pedigrees: a two-step variance component approach. Genetics. 2000, 156: 2081-2092.PubMed CentralPubMedGoogle Scholar
- Gilmour AR, Gogel BJ, Cullis BR, Welham SJ, Thompson R: ASReml User Guide Release 1.0. 2002, VSN International Ltd., Hemel Hempstead, HP1 1ES, UKGoogle Scholar
- Pong-Wong R, George AW, Wooliams JA, Haley CS: A simple and rapid method for calculating identity-by-descent matrices using multiple markers. Genetics Selection Evolution. 2001, 33: 453-471. 10.1186/1297-9686-33-5-453.View ArticleGoogle Scholar
- Wang T, Fernando RL, van der Beek S, van Arendonk JAM: Covariance between relatives for a marked quantitative trait locus. Genetics Selection Evolution. 1995, 27: 251-274. 10.1186/1297-9686-27-3-251.View ArticleGoogle Scholar
- Knott SA, Haley CS: Simple multiple--marker sib--pair analysis for mapping quantitative trait loci. Heredity. 1998, 81: 48-54. 10.1046/j.1365-2540.1998.00376.x.View ArticleGoogle Scholar
- Freyer G, Vukasinovic N, Cassell BG: Impacts of using inbred animals in studies for detection of quantitative trait loci. J Dairy Sci. 2009, 92: 765-772. 10.3168/jds.2007-0470.View ArticlePubMedGoogle Scholar
- Rokouel M, Vaez Torshizi R, Moradi Shahrbabak M, Sargolzaei M, Sorensen AC: Monitoring inbreeding trends and inbreeding depression for economically important traits of Holstein cattle in Iran. Journal of Dairy Science. 2010, 93: 3294-3302. 10.3168/jds.2009-2748.View ArticleGoogle Scholar
- Kristensen TN, Sorensen AC: Inbreeding - lessons from animal breeding, evolutionary Biology and conservation genetics. Animal Science. 2005, 80: 121-133.Google Scholar
- Vermeulen CJ, Bijlsma R, Loeschke V: A major QTL affects temperature sensitive adult lethality and inbreeding depression in life span in Drosophila melanogaster. BMC Evolutionary Biology. 2008Google Scholar
- Grapes L, Firat MZ, Dekkers JCM, Rothschild MF, Fernando RL: Optimal haplotype structure for linkage disequilibrium-based fine mapping of quantitative trait loci using identity by descent. Genetics. 2006, 172: 1955-1965. 10.1534/genetics.105.048686.PubMed CentralView ArticlePubMedGoogle Scholar
- Hoffmann K, Stassen HH, Reis A: Genkartierung in Isolatpopulationen. Medgen. 2000, 12: 428-437.Google Scholar
- Cassell BG: Dairy Cattle Inbreeding. 2008, [cited: April 2010], [http://www.extension.org/pages/Dairy_Cattle_Inbreeding]Google Scholar
- Legarra A, Aguilar I, Misztal I: A relationship matrix including full pedigree and genomic information. J Dairy Sci. 2009, 92: 4656-4663. 10.3168/jds.2009-2061.View ArticlePubMedGoogle Scholar
- Lund MS: Genomic prediction when some animals are not genotyped. Genet Sel Evol. 2010, 42: 2-10.1186/1297-9686-42-2.PubMed CentralView ArticlePubMedGoogle Scholar
- Schelling M, Stricker C, Fernando RL, Künzi N: PEDSIM - A simulation program for pedigree data. 6th World Congress on Genetics Applied to Livestock Production, Armidale, Australia, Proc. 1998, 27: 475-476.Google Scholar
- Falconer DS: Introduction to Quantitative Genetics. 1989, John Wiley and Sons: New York, 3Google Scholar
- Vukasinovic N, Martinez ML: Mapping quantitative loci in general pedigrees using the random genetic model approach. Proceedings of the Joint Statistical Meeting, CD-ROM, New York, NY. 2002, , American Statistical AssociationGoogle Scholar
- Hernández-Sánchez J, Haley CS, Woolliams JA: Prediction of IBD based on population history for fine gene mapping. Genetics Selection Evolution. 2006, 38: 231-252. 10.1186/1297-9686-38-3-231.View ArticleGoogle Scholar
- Hernández-Sánchez J, Grounchec JA, Knott S: A web application to perform linkage disequilibrium and linkage analyses on a computational grid. Bioinformatics. 2009, 25: 1377-1383. 10.1093/bioinformatics/btp171.View ArticlePubMedGoogle Scholar
- Hill WG, Hernandez-Sanchez J: Prediction of Multilocus Identity-by-Descent. Genetics. 2007, 176: 2307-2315. 10.1534/genetics.107.074344.PubMed CentralView ArticlePubMedGoogle Scholar