Effect of family relatedness on characteristics of estimated IBD probabilities in relation to precision of QTL estimates.

BACKGROUND
A random QTL effects model uses a function of probabilities that two alleles in the same or in different animals at a particular genomic position are identical by descent (IBD). Estimates of such IBD probabilities and therefore, modeling and estimating QTL variances, depend on marker polymorphism, strength of linkage and linkage disequilibrium of markers and QTL, and the relatedness of animals in the pedigree. The effect of relatedness of animals in a pedigree on IBD probabilities and their characteristics was examined in a simulation study.


RESULTS
The study based on nine multi-generational family structures, similar to a pedigree structure of a real dairy population, distinguished by an increased level of inbreeding from zero to 28% across the studied population. Highest inbreeding level in the pedigree, connected with highest relatedness, was accompanied by highest IBD probabilities of two alleles at the same locus, and by lower relative variation coefficients. Profiles of correlation coefficients of IBD probabilities along the marked chromosomal segment with those at the true QTL position were steepest when the inbreeding coefficient in the pedigree was highest. Precision of estimated QTL location increased with increasing inbreeding and pedigree relatedness. A method to assess the optimum level of inbreeding for QTL detection is proposed, depending on population parameters.


CONCLUSIONS
An increased overall relationship in a QTL mapping design has positive effects on precision of QTL position estimates. But the relationship of inbreeding level and the capacity for QTL detection depending on the recombination rate of QTL and adjacent informative marker is not linear.


Background
Studies on quantitative trait loci (QTL) in dairy cattle are performed almost exclusively on data from commercial populations. Setting up experimental populations is highly expensive and time consuming. Therefore, the simplest and most popular design for QTL mapping in dairy cattle was the granddaughter design (GDD, [1]). Single grandsires establish their "own families" with a number of sons (sires) genotyped for a marker panel, involving phenotypic information on the quantitative trait, based on several hundreds of cows ((grand)daughters).
The methodology to detect QTL in general pedigrees exploiting polymorphism of genetic markers was proposed by Fernando et al. (1989), based on a model where both the allelic QTL effects and the polygenic component are assumed to be random normal deviates [2]. The covariance between individuals for a putative QTL is modeled by the probabilities of sharing alleles identical by descent (IBD), based on linked marker genotypes. Such IBD scores are important prerequisites in a two-step procedure to compute variance components using ASREML [3,4]. The major advantage of the variance components approach is the ability to account for relationships among individuals in different families. Pong-Wong et al. (2001) proposed a fast deterministic approach to estimate IBD probabilities by combining the methods of Wang et al. (1995) and Knott and Haley (1998) [5][6][7]. Consequently, managing inbreeding loops with minimal information loss became feasible. This is of interest, since pedigree patterns harbouring inbred individuals do occur in many animal species, even if inbreeding should be avoided under commercial breeding conditions.
A recent, to our knowledge the first, study has shown that using inbred sires in a pedigree positively exerts QTL detection [8]. However, how this applies is not straightforward, as there was neither different phenotypic variance, nor different (poly)genetic variance between the family structures in the simulation study cited. Sensitivity to environmental changes increases in inbred individuals due to loss of heterozygosity accompanied by impaired ability to react to changing or suboptimal environmental conditions [10]. Most likely, such inbreeding-specific environmental effects do occur in dairy cattle as well. Moreover, respecting negative inbreeding effects on fertility, health and also on economically important traits would be relevant [9]. But the molecular genetic basis of the inbreeding depression phenomenon is still being examined in model species [11]. Therefore, a realistic basis to include these effects in a simulation model is missing.
The only source of inbreeding effects on QTL position estimates in the recent study was the IBD probability. Investigating IBD parameters in this sense remained an open task. The objective of the simulation study summarized in this paper was to provide more insight into the characteristics of IBD probability in relation to the inbreeding level. Therefore, IBD parameters were examined in pedigrees with four generations and stepwise increasing inbreeding level and slightly varying marker panels for QTL mapping. Moreover, an attempt was made to target the theoretical "optimum inbreeding level" for QTL mapping. An extremely high inbreeding level was necessarily considered in order to evaluate the resulting IBD parameters regarding the theoretical optimum inbreeding level.

Results and discussion
Detailed QTL estimates obtained from "mildly inbred" family structures (average F x in sires of final offspring increasing from 0 up to 0.042) were reported earlier [8]. The results obtained from the new family structures with higher inbreeding levels followed the trend of positive inbreeding effects on estimated QTL positions, except for FS5 ( Figure 1). IBD probabilities deviating along the marked chromosome were mainly responsible for the goodness of QTL position estimates.
Simulation parameters in terms of marker map, number of marker alleles and information content did not affect IBD probability significantly (P-values ranged from 0.75 to 0.87). Family structure, characterized by its inbreeding level, had a significant effect on the IBD probability (P < 0.0001). The IBD parameters were contemplated more detailed in order to identify the reasons for more successfully estimated QTL positions obtained by analysing stronger inbred family structures.

Means and standard deviations of IBD probabilities
The mean IBD probability (at the true QTL position over all simulation parameters) increased slightly from FS0 to FS88. It was highest in FS99 (Table 1). A minor gap in the general trend was caused by a zero relationship coefficient between the descendants of GGS1 and GGS2 in FS5 ( Table 2). All other family structures were linked in both sub-pedigrees. Apart from FS5, ancestral IBD sharing probability of great grandsires GGS1 and GGS2 and their offspring was greater for GGS1 (0.055 in FS0, 0.316 in FS99) than for GGS2 (0.034 in FS0, 0.135 in FS99). The highest IBD sharing probability was in FS99, with both great grandsires equally related to the final offspring by the other great grandsire (see Table 2 for relationship coefficients).

IBD parameters and profiles
The shape of profiles of IBD parameters along the marked chromosomal region is an indication for the precision of QTL mapping [12]. However, this statement by Grapes et al. (2006) was neither a conclusion from analyzing inbred pedigree structures, nor a result of analyzing practice-like mapping designs. Therefore, we investigated first the profiles of means and standard deviation of IBD along the marker maps ( Figure 2). The profiles of average IBD probabilities were flat in almost all family structures. However, FS99 showed a clear break in the course between 34 and 38 cM, as shown in combinations with M1 and M4. This suggests that a recombination happened in the basic generation could be more precisely detected when it was followed by strong inbreeding and manifested in homozygous blocks. As a side effect, the mean IBD probabilities in FS99 show that apparently slightly differing marker distances (M1 compared to M4) do affect the profiles of IBD parameters.
Different profiles of correlations of IBD scores at the true QTL position and IBD scores along the map (profcorrIBD) could presumably be an indication of deviating reaction to simulation parameters, e.g. marker map ( Figure 3). In general, correlations were smaller with increasing distance from the true QTL position. The reduction was stronger in combinations with more than two marker alleles (single plots not shown in detail). All family structures showed some variation in the steepness of profcorrIBD profiles (Figure 3). The overall message is clear: FS99, containing both the highest levels of inbreeding, IBD probability and relatedness, reached the steepest profcorrIBD in all combinations of simulation parameters. The profiles of FS99 became extremely steep, when the QTL-flanking markers were > 3.5 cM apart (as in M2 and M4). The reason for the large differences in profcorrIBD between FS99 and others is the same as for the profiles of average IBD probabilities and standard deviations ( Figure 2). Parameters at the most distant map position at 0 cM and at the true QTL position (Table 1)   were basically the frames for the steepness of correlation profiles profcorrIBD ( Figure 3).

QTL position estimates and pedigree relatedness
Except for FS5, the frequency of correctly estimated QTL positions (within an interval of 41.5 ± 1.5 cM on the marked chromosome) increased significantly with stepwise increasing inbreeding level ( Figure 1). Most outliers resulted from analysing FS0. FS99 yielded the best results in terms of most correct QTL position estimates and least deviations from the actual QTL position. Parallel runs via GridQTL mimicking a combined LD/ LA-analysis yielded the same QTL position estimates as from the linkage analyses above, confirming robustness of QTL position estimates. When 100 sires and 100 dams were set up for a historic population 100 generations back, then the shape of the test statistic profile was similar to those obtained from the linkage analyses as described. The peak of the test statistic profiles became much sharper when choosing only two sires in the historic population (test statistic profiles not shown). This is another indication of the impact of a historically stronger related, and most likely more inbred, background when population history started with only two sires. In all cases, exactly the same situation in the pedigrees of FS0 to FS99 was given, whether 100 sires or two sires were chosen for a historical population. Hoffmann et al. (2000) stated that an older population with "reduced founder haplotypes by recombination" is more suited for fine mapping [13]. Subsequent generations of inbreeding as in FS99 could be advantageous in this sense as well. Thus, our results support the conclusions from analysing human pedigrees.

Approaching the optimum inbreeding level
The relationship between increasing inbreeding level F x and cov(IBS, IBD) is not linear (Figure 4). The optimum of F x depends on the recombination rate of QTL and the nearest informative marker, being the same in marker maps M1 und M3 (c = 0.01). Higher N reduces cov (IBS, IBD) and thereby it reduces the optimum inbreeding level. The lower the recombination rate, the higher F x at which the maximum of cov(IBS, IBD) can be reached ( Figure 4). Table 3 shows the effect of effective population size, generation number and allelic frequency of a trait locus on maximum F x and cov(IBS, IBD). In most cases (i.e. with recombination rate c = 0.01 of adjacent marker and QTL), the optimum in terms of maximum cov(IBS, IBD) was reached at F x = 0.35. Our family structures, except FS99 with F x = 0.28, were still far from reaching this optimum. An inbreeding level as great as in FS99 is not assumed to be relevant for today's dairy practice. But there is an old example (sire "Beltsville") serving as a proof for a very high level of inbreeding.

Outlook
Our results are not considered to encourage inbreeding for practical breeding. Inbreeding depression effects have to be avoided. But, a capacity for exploiting inbreeding for QTL study designs is still available [14]. However, as in each successful QTL analysis, the prerequisite is a QTL actually segregating in the pedigree to be studied. It should be mentioned, that estimation of breeding values, based on a relationship matrix incorporating pedigree information and genomic information, is still a topic in the literature, even with respect to the advanced dense SNP technology for genomic selection [ [15] and [16]].
In this study, cov(IBS, IBD) is based on one marker only. We used the recombination rate of the nearest informative marker and QTL for calculating cov(IBS, IBD). The focus was on a practical pedigree as we can find in conventional dairy cattle breeding. The method can be extended to multiple markers. Further, a more general conclusion could be drawn by simulating pedigrees with random mating of diploid organisms with discrete generations and stepwise evaluating QTL estimates. Using a defined population history, such design could reveal an even higher average level of inbreeding  (co-ancestry) than assumed with the two founder sires in our study.

Conclusions
Our simulation study carried out with respect to realistic conditions in dairy cattle revealed intrinsic relationships between precision of estimated QTL positions and pedigree relatedness in the mapping population. IBD parameters obtained from analysing family structures with varying inbreeding load yielded conclusive results with respect to the meaning of inbreeding for QTL estimation and its dependence on relatedness. Related pedigrees are necessary for linkage analyses, and the stronger the relatedness is, the greater is the success of such studies. Comparing two versions of historic populations used in a GridQTL analysis that mimics a combined LD/LA-analysis additionally underlined the advantage of inbreeding and increased relatedness. This leads us to the assumption that linkage disequilibrium of markers and QTL across several generations could easier be detected than in non inbred or "less related"  pedigrees. It must be noted that the relationship of the capacity for QTL detection (here, expressed by cov(IBS, IBD)) and the average inbreeding level of a population is not linear. Finally, these results apply to the situation of one biallelic QTL actually segregating in the pedigree, marked by a defined chromosomal segment.

Data simulation
The basis of the study design was a general pedigree structure comprising four generations with 850 individuals ( Figure 5). GGS1 and GGS2 were male founders, followed by four grandsires and nine sires, with 42 to 78 offspring each (544 final offspring in total). Overlapping generations were also included in that one great grandsire, GGS1, was the sire of 69 final offspring. Both male founders were called 'great grandsires', regardless of overlapping generations.
Nine family structures (FS) were created with different inbreeding levels (described in detail in Table 4). FS0 included no inbreeding, as no parents were related (Figure 5). All other family structures consisted of the same numbers of generations and individuals as in FS0, but containing inbred mates. The inbreeding level increased from mild inbreeding in FS1 (F x = 0.0625 of one single sire) up to a higher level in FS88 (F x = 0.15 on average of all sires), and up to an extremely high inbred FS99 (F x = 0.28 in total, Table 4). Maternal structure in final offspring remained the same in all family structures (as in [8], Table 5). As the parents of four founder individuals (two great grandsires and two great grand-dams) and most of the female mates of the grandsires were unknown, their genotypes were sampled according to allelic frequencies. Their distribution was modeled to be in Hardy Weinberg equilibrium (approach as in Schelling et al., 1998) [17]. Specifically, diploid offspring genotypes composed by a maternal and a paternal gamete were assumed in Mendelian inheritance. Marker information and phenotypes of both great grandsires were equal in all family structures. Trait values (phenotypes) were generated as follows: Individual trait observation y i is based on a normally distributed QTL effect q, polygenic effect g (here, as a sum of 20 single gene effects) plus residual effect e, according to model (1) QTL effect was assumed to be normally distributed with mean zero and QTL-variance s q 2 , contributing 15% of the total trait variance, based on additive and dominance effects of the QTL (a and d) and allelic frequencies (m and n) [18], The polygenic component, contributing 25% of the total variance, was assumed normally distributed with mean zero and additive polygenetic variance, of single genes (g) with small effects of alleles l and k at each locus, A random deviate e was normally distributed with mean zero and variance σ res 2 (i.e. 60 percent of the total variance). Recombination events were simulated on the basis of a binomial map function. Trait values and marker genotypes were simulated in an identical manner for all family structures, applying the PEDSIM approach [17].
Sixty datasets were simulated for each family structure, based on variations in three simulation parameters (Table 5): (i) marker positions on the chromosomal Table 3 Inbreeding coefficient F x , allelic frequency (m), number of generations (t) at maximum cov(IBS, IBD) for various effective population sizes (N) and for recombination rate c = 0.01 of QTL and adjacent marker  The pre-fine-mapping QTL study covered a 55 cM chromosomal segment, which was expected to harbor one QTL. The focus of this study was clearly on detecting inbreeding effects on parameters of the IBD probability in consideration with estimated QTL map position. Thus, principal conclusions on them do not depend on the kind of molecular markers, numbers of markers or marker alleles. The 60 data sets per family structure were repetitions. Using this term is comparable to successive health data collections, in the same patients and their families, at different times of life, in different clinics, or treatments simultaneously affecting all families at a time in the same way. Statistical parameters were calculated by using SAS package, version 9.1 (SAS Institute, Inc., Cary, NC), and effects of simulation parameters on IBD scores were tested with proc GLM.

Calculating IBD probabilities and QTL analysis
The QTL effect was assumed random, with co-variance structure between individuals being a function of IBD probabilities at a particular location. A Fortran 90 program for calculating IBD probabilities was written that enables exploiting as much available information on pedigrees and markers as possible [19]. The kernel of the program package was the rapid deterministic recursive algorithm for calculating IBD probabilities between each pair of gametes [5], followed by transmission of marker alleles from parents to offspring [2]. Further, a   method by Knott and Haley (1998) was implemented to determine IBD probabilities among (full) sibs' gametes in the second generation [7]. IBD probabilities were calculated for each pair of gametes independently, to obtain a matrix G p of gametic IBD probabilities at each position (p) in the chromosomal segment. Then, a mixed model was applied y X Zu H a e p p where y is an (n 1 × 1) vector of phenotypes. n 1 refers to the number of animals with phenotypes, and n 0 is the total number of animals in the pedigree. X is an (n 1 × s) design matrix of a number of fixed effects (s), Z is an (n 1 × n 0 ) incidence matrix relating animals to their phenotypes, H p is a (n 1 × 2n 0 ) incidence matrix relating animals to paternal and maternal QTL alleles at position p, b is an (s × 1) vector of fixed effects, u is an (n 0 × 1) vector of random polygenic effects, a p is an (2n 0 × 1) vector of the effect of a QTL at position p, and e is an (n 1 × 1) residual vector with expectation and covariance matrix (0, E ⊗ I), where E is the unknown (co-) variance matrix of the residual effects and I denotes the identity matrix. X is equal to 1, since all phenotypes were assumed pre-adjusted for non-genetic effects, and thus s = 1 and β = μ. The random polygenic effects u, and QTL effect a p , are assumed to follow a normal distribution with mean zero and variances As 2 u and G p s 2 p , respectively. Matrix A is the additive relationship matrix. Matrix G p contains IBD probabilities at position p, obtained as described above. The model was fitted for each single position p (in steps of 1 cM) on the chromosomal segment. The data were analyzed using a random model variance component approach. The residual maximum likelihood (REML) procedure implemented in the ASReml software [3] was used to maximize the likelihood under both H 0 and H A given the parameters for computing the likelihood ratio LR = -2(lnL H0 -lnL HA ) c 2 df , to be calculated at each position p to find the most likely QTL position, with lnL H0 the logarithm of the likelihood computed for the pure polygenic model, and ln L HA the logarithm of the likelihood from the QTL model.

Parallel analyses assuming combined linkage disequilibrium and linkage
The QTL estimates obtained by linkage analysis (LA) as described above were compared by results of independent analyses using GridQTL. Thereby, the R-method that is based on a regression model was adapted [20]. The advantage of this method is that it only requires genotypes instead of haplotypes to establish the "historical generation" [21]. Here, 100 historical generations back to the defined pedigree design were chosen to mimic linkage disequilibrium (LD). Two extreme versions (two and hundred sires for mating to 100 cows each) characterized the effective population size of the historical generation. This step enabled analysing the data in terms of combined LD/LA [20,21].

Covariance of IBS and IBD and inbreeding level
The IBD matrix at the hypothetical QTL position is constructed using the identity by state information (IBS) from nearby markers. Thus, the covariance between IBS at a marker and IBD at a QTL (cov(IBS, IBD)) is a good parameter to study the relationship between inbreeding and ease of QTL detection. This parameter was defined as ( , ) ,