- Research article
- Open Access
Extent of third-order linkage disequilibrium in a composite line of Iberian pigs
- Luis Gomez-Raya^{1}Email authorView ORCID ID profile,
- Luis Silio^{1},
- Wendy M. Rauw^{1},
- Luis Alberto Gracia-Cortés^{1} and
- Carmen Rodríguez^{1}
- Received: 15 March 2018
- Accepted: 26 July 2018
- Published: 17 August 2018
Abstract
Background
Previous studies on linkage disequilibrium have investigated second order linkage disequilibrium in animal and plant populations. The objective of this paper was to investigate the genome-wide levels of third order linkage disequilibrium in a composite line founded by admixture of four Iberian pig strains. A model for the generation of third order linkage disequilibrium by population admixture is proposed. A computer Expectation-Maximization algorithm is developed and applied to the estimation of third order linkage disequilibrium at inter- and intra-chromosomal level using 26,347 SNPs typed in 306 sows. The relationship of third order linkage disequilibrium with physical distance was investigated over 35 million triplets in SSC12. Basic and normalized estimates of inter and intra-chromosomal third order linkage disequilibrium are reported.
Results
Genome-wide analyses revealed that third order linkage disequilibrium is rather common among linked loci in this Iberian pig line. It is shown that population admixture of multiple populations may explain the observed levels of third order linkage disequilibrium although it could be generated by genetic drift. Third order linkage disequilibrium decreases rapidly up to 4 Mb and then declines slowly. The short distances between consecutive markers explain the maintenance of the observed third order linkage disequilibria levels when using a model incorporating the break-up of disequilibrium by recombination. Genome-wide testing also revealed that only 3.6% of the normalized estimates were different from 1, − 1, 0, or from a not well-defined situation in which there is only one possible value for the third order linkage disequilibrium parameter, given allele frequencies and pairwise linkage disequilibria parameters.
Conclusions
Third order linkage disequilibrium is common among linked markers in the analyzed pig line and may have been generated by population admixture of multiple populations or by genetic drift. As with second order linkage disequilibrium, the absolute value of the third order linkage disequilibrium decreases with physical distance. Normalization of third order linkage disequilibrium should be avoided for closely linked bi-allelic loci.
Keywords
- Third order linkage disequilibrium
- Iberian pigs
- High order linkage disequilibrium
- Linkage disequilibrium
Background
Linkage disequilibrium is defined as the non-random association of alleles at two or more loci. In many instances it is due to the physical organization of DNA sequences in which each nucleotide follows another in one single chain. It can also be due to genetic drift or selection. Linkage disequilibrium is a key parameter to understand evolution and as stated by Slatkin, it is an indicator of the population genetic forces that structure a genome [1, 2]. In the last decades, research on linkage disequilibrium has enhanced as an aid to map genes affecting diseases or quantitative traits in the so called Genome-Wide Associations Studies (GWAS; [3]), and in genomic selection aimed at exploiting associations of alleles at production traits with single nucleotide polymorphisms scattered over the entire genome [4]. There is a bulk of literature on these topics but the great majority of them consider just two loci at a time, however, the associations between alleles can occur for any number of loci. Third and higher order linkage disequilibrium is related to the association of alleles at several loci. Because of its complexity, high order linkage disequilibrium has not been much investigated in both animal or plant populations or in applications to help to understand their role in the expression of phenotypic traits. High order linkage disequilibrium might be related to epistasis because of its nature involving multiple loci. However, it is not a simple association of one allele to several alleles belonging to other loci but the association of one of the alleles to several haplotypes at the other loci. That is, third order linkage disequilibrium arises from the association between alleles not explained by second order linkage disequilibrium.
Components of high order linkage disequilibrium
Number of loci | Maximum Order LD | Number of haplotypes | Max value of highest order disequilibrium | Number allele freq. to estimate | LD parameters to estimate |
---|---|---|---|---|---|
2 | 2 | 4 | 0.25 | 2 | 1 |
3 | 3 | 8 | 0.125 | 3 | 4 |
4 | 4 | 16 | 0.0625 | 4 | 11 |
n | n | 2^{n} | \( \frac{1}{2^n} \) | n | 2^{n}-n-1 |
All this research was carried out in the eighties and nineties when the development of genetic markers was still in its infancy (with microsatellites at their peak use) but with little coverage of animal genomes for today’s standards. The development of array technologies incorporating from thousands to hundreds of thousands of Single Nucleotide Polymorphisms (SNPs) has provided new tools to uncover the associations between alleles at different loci located elsewhere in the genome. Kim et al. [11] proposed a multi-locus high order linkage disequilibrium with a multiple order Markov chain model. Berg et al. [12] have developed a method for estimation of multi-allelic third order linkage disequilibrium. Nevertheless, no publication exists on the genome-wide levels of third order linkage disequilibrium present in animal populations.
The objective of this paper is to investigate third order linkage disequilibria using a 60 K SNP array of Illumina in a closed population of Iberian pigs. This is the first report on the extent of third order linkage disequilibria in animal genomes. In order to carry out extensive third order linkage disequilibrium estimation, a simple and efficient EM algorithm for the estimation of third order linkage disequilibrium of biallecic markers was also developed. In addition, the way that third order linkage disequilibrium is generated by population admixture was also investigated.
Third order linkage disequilibrium theory
Generation of third order linkage disequilibrium by population admixture
It is well established that the crossing between two populations differing in allele frequencies at two loci may generate second order linkage disequilibrium [13]. In this section, the generation of third order disequilibrium by admixture of two populations is shown. Let Z be the resulting cross from two populations, X and Y. Let three loci T/t, M/m, and K/k be located in that order on a chromosome. We will assume that these loci are not affected by selection.
where \( {\delta}_{MK}^X \), \( {\delta}_{TK}^X \), \( {\delta}_{TM}^X \), are second order linkage disequilibrium parameters for loci MK, TK, and TM, respectively in population X; \( {\delta}_{TMK}^X \)is the third order linkage disequilibrium coefficient in population X. The same coefficients but with superscript Y are for population Y.
where γ_{T}, γ_{M}, and γ_{K} represent the difference in allele frequency in the two populations at crossing for loci T/t, M/m, and K/k, respectively. A full derivation of eq. (5) is given in the Additional file 1: Appendix 1.
Break-up of third order linkage disequilibrium by recombination
where c_{TM} and c_{MK} are the recombination fractions between T/t and M/m, and between M/m and K/k, respectively; \( {f}_{ijk}^{t-1} \) represents the frequency of ijk in generation t-1. Dots in the subscripts of this equation are used to represent either allele (two dots) or two-locus haplotype (one dot) frequencies corresponding to haplotype ijk. For exmple f_{.jk} is the frequency of haplotype with alleles jk at the last two loci, M/m and K/k. This model assumes no interference.
The maximum third order linkage disequilibrium occurs when the three loci are at intermediate allele frequencies, and second order disequilibria are zero (δ_{TM} = δ_{TK} = δ_{MK} = 0). In this situation, δ_{TMK} must range between \( -\frac{1}{8} \) and \( \frac{1}{8} \), which represents the limits of this parameter. If \( {\delta}_{TMK}=\frac{1}{8} \) then only four haplotypes are segregating (TMK, tmK, Tmk, tMk), none of them with alleles complementary to each other. We investigated the break-up of linkage disequilibrium in this situation, in which third order linkage disequilibrium is the highest possible. The break-up of third order linkage disequilibrium was computed with the equation of third order linkage disequilibrium (\( {\delta}_{TMK}^t \)) after substituting haplotype frequencies \( {f}_{ijk}^t \).
Normalization of the third order linkage disequilibrium parameter
This equation just provides information on how important is third order linkage disequilibrium relative to second order linkage disequilibrium in the context where mutual constrains exist between these coefficients. This parameter has values between 0 and 1, with 1 meaning that all disequilibrium is third order, and with 0 that all disequilibrium is second order. Values over 0.5 indicate that most of the disequilibrium is of third order.
A computer algorithm for the estimation of third order linkage disequilibrium
- a)
Intermediate allele frequencies at the three loci (f_{T} = f_{M} = f_{K} = 0.5) and zero second order disequilibria between all pairs (δ_{TM} = δ_{TK} = δ_{MK} = 0). After using equation (6), the haplotype frequencies are: \( {f}_{TMK}=\frac{1}{8}+{\delta}_{TMK} \), \( {f}_{TmK}=\frac{1}{8}-{\delta}_{TMK} \), \( {f}_{tMK}=\frac{1}{8}-{\delta}_{TMK} \), \( {f}_{tmK}=\frac{1}{8}+{\delta}_{TMK} \), \( {f}_{TMk}=\frac{1}{8}-{\delta}_{TMK} \), \( {f}_{Tmk}=\frac{1}{8}+{\delta}_{TMK} \),\( {f}_{tMk}=\frac{1}{8}+{\delta}_{TMK} \), and \( {f}_{tmk}=\frac{1}{8}-{\delta}_{TMK} \). Consequently, δ_{TMK} must range between \( -\frac{1}{8} \) and \( \frac{1}{8} \), because all haplotype frequencies must be zero or positive. If \( {\delta}_{TMK}=\frac{1}{8} \) then \( {f}_{TMK}={f}_{tmK}={f}_{Tmk}={f}_{tMk}=\frac{1}{4}\kern0.5em \), and f_{Tmk} = f_{tMK} = f_{TMk} = f_{tmk} = 0. Then, only four haplotypes are segregating (TMK, tmK, Tmk, tMk), none of them with complementary alleles to each other.
- b)
Intermediate allele frequencies for the three loci (f_{T} = f_{M} = f_{K} = 0.5), and maximum second order LD (δ_{TM} = δ_{TK} = δ_{MK} = 0.25). Then, the haplotype frequencies are \( {f}_{TMK}=\frac{1}{2}+{\delta}_{TMK} \), f_{TmK} = − δ_{TMK}, f_{tMK} = − δ_{TMK}, f_{tmK} = δ_{TMK}, f_{TMk} = − δ_{TMK}, f_{Tmk} = δ_{TMK}, f_{tMk} = δ_{TMK}, and \( {f}_{tMk}=\frac{1}{2}-{\delta}_{TMK} \). Consequently, δ_{TMK} = 0 and \( {f}_{TMK}=\frac{1}{2}\kern0.5em \), \( {f}_{tmk}=\frac{1}{2}\kern0.5em \) because all haplotype frequencies must be zero or positive.
Summarizing, the range of possible values of the third order linkage disequilibrium are between − 0.125 and + 0.125 and equal to zero when second order linkage disequilibria are at their maximum values. Only haplotypes TMK, tmK, Tmk, and tMk will be segregating for full third order linkage disequilibrium (δ_{TMK} = 1/8) at intermediate allele frequencies and in absence of any second order linkage disequilibrium. That is, allele K will be associated to haplotypes TM and tm, and allele k to haplotypes Tm and tM. A similar argument can be done for δ_{TMK} = − 1/8 with allele k is associated to haplotypes TM and tm, and allele K to haplotypes Tm and tM.
Frequencies of sire and dam gametes for all possible combinations of alleles at three SNPs (with alleles T/t M/m and K/k) to produce the 27 genotypes
Sire | TMK | TmK | tMK | tmK | TMk | Tmk | tMk | tmk | |
---|---|---|---|---|---|---|---|---|---|
Dam | Freq | f _{TMK} | f _{TmK} | f _{tMK} | f _{tmK} | f _{TMk} | f _{Tmk} | f _{tMk} | f _{tmk} |
TMK | f _{TMK} | TTMMKK f_{TMK} f_{TMK} | TTMmKK f _{TMK} f _{TmK} | TtMMKK f _{TMK} f _{tMK} | TtMmKK f_{TMK}f_{tmK} | TTMMKk f_{TMK}f_{TMk} | TTMmKk f_{TMK}f_{Tmk} | TtMMKk f_{TMK} f_{tMk} | TtMmKk f_{TMK}f_{tmk} |
TmK | f _{TmK} | TTMmKK f_{TmK} f_{TMK} | TTmmKK f _{TmK} f _{TmK} | TtMmKK f_{TmK}f_{tMK} | TtmmKK f_{TmK}f_{tmK} | TTMmKk f_{TmK}f_{TMk} | TTmmKk f_{TmK}f_{Tmk} | TtMmKk f_{TmK}f_{tMk} | TtmmKk f_{TmK}f_{tmk} |
tMK | f _{tMK} | TtMMKK f_{tMK} f_{TMK} | TtMmKK f_{tMK}f_{TmK} | ttMMKK f_{tMK} f_{tMK} | ttMmKK f_{tMK}f_{tmK} | TtMMKk f_{tMK}f_{TMk} | TtMmKk f_{tMK}f_{Tmk} | ttMMKk f_{tMK}f_{tMk} | ttMmKk f _{tMK} f _{tmk} |
tmK | f _{tmK} | TtMmKK f_{tmK} f_{TMK} | TtmmKK f_{tmK}f_{TmK} | ttMmKK f_{tmK}f_{tMK} | ttmmKK f_{tmK}f_{tmK} | TtMmKk f_{tmK}f_{TMk} | TtmmKk f_{tmK}f_{Tmk} | ttMmKk f_{tmK}f_{tMk} | ttmmKk f_{tmK}f_{tmk} |
TMk | f _{TMk} | TTMMKk f_{TMk} f_{TMK} | TTMmKk f_{TMk}f_{TmK} | TtMMKk f_{TMk}f_{tMK} | TtMmKk f_{TMk}f_{tmK} | TTMMkk f_{TMk}f_{TMk} | TTMmkk f_{TMk}f_{Tmk} | TtMMkk f_{TMk}f_{tMk} | TtMmkk f_{TMk}f_{tmk} |
Tmk | f _{Tmk} | TTMmKk f_{Tmk} f_{TMK} | TTmmkK f_{Tmk}f_{TmK} | TtMmKk f_{Tmk}f_{tMK} | TtmmKk f_{Tmk}f_{tmK} | TTMmkk f_{Tmk}f_{TMk} | TTmmkk f_{Tmk} f_{Tmk} | TtMmkk f_{Tmk}f_{tMk} | Ttmmkk f_{Tmk}f_{tmk} |
tMk | f _{tMk} | TtMMKk f_{tMk} f_{TMK} | TtMmKk f_{tMk}f_{TmK} | ttMMKk f_{tMk}f_{tMK} | ttMmKk f_{tMk}f_{tmK} | TtMMkk f_{tMk}f_{TMk} | TtMmkk f_{tMk}f_{Tmk} | ttMMkk f_{tMk}f_{tMk} | ttMmkk f _{tMk} f _{tmk} |
tmk | f _{tmk} | TtMmKk f_{tmk} f_{TMK} | TtmmKk f_{tmk}f_{TmK} | ttMmKk f_{tmk}f_{tMK} | ttmmKk f_{tmk}f_{tmK} | TtMmkk f_{tmk}f_{TMk} | Ttmmkk f_{tmk}f_{Tmk} | ttMmkk f_{tmk}f_{tMk} | ttmmkk f _{tmk} f _{tmk} |
- I)
Set initial haplotype frequencies (arbitrarily),
- II)
Expectation step in which genotype frequencies are estimated based on haplotype frequencies from step I. In order to resolve to which haplotypes may correspond observed genotype counts the proportion of double or triple heterozygotes in coupling or repulsion for each haplotype needs to be computed. In programing in Fortran we used code 3 for individuals with the heterozygote genotype. For example, for haplotype “111”, the proportion of individuals with genotype heterozygote at the two first loci and homozygote with allele 1 at the third loci is:
- III)
The maximization step consists in estimating haplotype frequencies using genotype counts observed or as estimated in step II. For example for the haplotype “111”, the frequency to be estimated is:
- IV)
Go to step II until convergence of haplotype frequencies is reached.
This algorithm is simple and suitable for fast computing when implemented in a computer language such as Fortran90. In this implementation, the number of individuals for each triple genotype is stored in an array with three dimensions. Each one corresponds to one locus and there are three alternatives: “1” and “2” are used for homozygotes, and “3” for heterozygotes. Source code for estimating haplotype frequencies in Fortran90 is provided in the Additional file 2: Appendix 2. Linkage disequilibrium parameters can be easily estimated from haplotype frequencies using equation (6).
Analysis of third order linkage disequilibrium in IBERIAN pigs
Animal material
Genotypes from 306 sows belonging to a composite line (Torbiscal) genetically isolated between 1963 and 2013 and resulting from the blending of four ancient Spanish and Portuguese Iberian breed strains [17], were used in this study.
Genotyping of SNPs
DNA samples were isolated from blood using a standard phenol/chloroform protocol and genotyped with the Illumina Porcine SNP60 BeadChip [18] and the Infinium HD Assay Ultra protocol (Illumina Inc.). Genotypes of 62,163 SNP probes were analyzed with the Genome Studio software (Illumina). Data quality control was performed according to the following criteria: call rate of the sample > 0.96; SNPs with a call rate > 0.99; GenTrain Score > 0.70; AB R Mean > 0.35; and MAF > 0.05. SNPs located on sex chromosomes, those not mapped in the Sscrofa10.2 assembly, or those with inconsistent inheritance from dam to daughter were also removed. Only 26,347 polymorphic SNPs were retained and used for further analyses.
Inter and intra-chromosomal third order linkage disequilibrium
Estimation of the third order linkage disequilibrium parameter was carried out for all triplets of three consecutive SNPs for each of the autosomal chromosomes. This will be referred to as short range intra-chromosomal third order linkage disequilibrium. The total number of triplets was 26,311. In addition, inter-chromosomal third order linkage disequilibrium was estimated by randomly drawing three out of the 18 autosomal chromosomes and by selecting randomly one SNP within each chromosome. In order to make an easier comparison between inter and intra-chromosomal third order linkage disequilibrium, the process was repeated 26,311 times, the same number as for the intra-chromosomal third order linkage disequilibrium. Similar to the second order linkage disequilibrium, the third order linkage disequilibrium is expected to be negligible in most inter-chromosomal situations.
Third order linkage disequilibrium and physical distance
An extensive analysis in the smaller chromosome (SSC12) was carried out in order to investigate the relationship between third order linkage disequilibrium and physical distance. Third order linkage disequilibria were estimated among all possible triplets in a rolling chromosomal fragment of 400 SNPs. The total number of analyses was 35,784,200. In addition, the relationship of the proposed normalized third order linkage disequilibrium β_{TMK}, with the number of haplotypes and a likelihood ratio test was investigated. The likelihood ratio test was computed using the full model, i.e., estimating all parameters versus a model with only allele frequencies (model M15 of Long et al. [9]).
Results
Third order linkage disequilibrium generated in population Z (\( {\delta}_{TMK}^Z \)) after admixture of populations X and Y with alternative allele frequencies and second order linkage disequilibrium parameters. It assumes τ=0.5 and \( {\delta}_{TMK}^X=0 \), \( {\delta}_{TMK}^Y=0 \)
\( {f}_T^X \) | \( {f}_M^X \) | \( {f}_K^X \) | \( {f}_T^Y \) | \( {f}_M^Y \) | \( {f}_K^Y \) | \( {\delta}_{MK}^X \) | \( {\delta}_{TK}^X \) | \( {\delta}_{TM}^X \) | \( {\delta}_{MK}^Y \) | \( {\delta}_{TK}^Y \) | \( {\delta}_{TM}^Y \) | \( {\delta}_{TMK}^Z \) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0.8 | 0.8 | 0.8 | 0.2 | 0.2 | 0.2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
0.8 | 0.2 | 0.8 | 0.2 | 0.8 | 0.2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 |
0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 |
0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 |
0.5 | 0.5 | 0.5 | 0.2 | 0.2 | 0.2 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0.056 |
0.5 | 0.5 | 0.5 | 0.2 | 0.2 | 0.2 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0.038 |
0.5 | 0.5 | 0.5 | 0.1 | 0.1 | 0.1 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0.075 |
0.5 | 0.5 | 0.5 | 0.01 | 0.01 | 0.01 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0.092 |
Following Robinson et al. (1991), the results of the estimates of normalized third order linkage disequilibrium were 0 (14.9%), ± 1 (20%), or an undefined situation in which the third order disequilibrium parameter is restricted to one single value given allele frequencies and second order disequilibrium parameters (61.5%). Only 3.6% of all the analyses resulted in a third order parameter different from 0, 1, − 1 or the undefined situation. This may be related to the number of haplotypes (and/or constraints imposed by allele frequencies and second order linkage disequilibria) because there was not a single estimate of undefined situation, 1, − 1 or 0 among the 309 triplets with eight haplotypes. There were only seven out of 1648 triplets for which estimates of normalized third order linkage disequilibrium were different from undefined situation, 1, − 1 or 0 among triplets with just two haplotypes. Therefore, normalizing third order linkage disequilibrium as proposed by Robinson et al., (1991) is not useful for deciphering situations involving closely linked markers.
Discussion
The results obtained in this study are based on the Expectation Maximization algorithm [8, 21], which assumes that the population is under random mating, and consequently, in Hardy-Weinberg equilibrium. The estimates of third order linkage disequilibrium may be affected by the fact that this population was created 50 years ago by the crossing of four strains of Iberian pigs. Our results, displayed graphically, support that third order linkage disequilibrium is not uncommon for linked loci in this composite line of Iberian pigs. In addition, the magnitude of the disequilibrium decreases with the distance between loci as supported by the negative correlation between the distance and the third order linkage disequilibrium (− 0.115). Although the magnitude of this correlation is not high, the correlation was calculated with triplets of adjacent SNPs in which the distances are small. By using over 35 million triplets on SSC12 but allowing for all possible distances between markers this correlation became − 0.23. In conclusion, third order linkage disequilibrium declines with distance between markers.
Normalization of third order linkage disequilibrium [10] for closely linked loci is not generally useful because a majority of the estimates were either − 1, 1 or undefined (only one possible value for a given allele and pairwise disequilibrium). This is attributable to the observed low number of haplotypes for consecutive SNPs, which ultimately originates from the joined forces of mutation and recombination. A low number of haplotypes reduces also the number of parameters to be estimated and may impose additional constraints on pairwise and third order linkage disequilibria. Under these circumstances, the value of third order linkage disequilibria should not be normalized, and estimates should be used directly but bearing in mind that their value is constrained by allele frequencies and pairwise linkage disequilibria. An alternative to normalizing is the use of the proportion of third order linkage disequilibria versus all disequilibria. It provides information on the value of the relative proportions of third and pairwise linkage disequilibria. This proportion appears to be related to the number of haplotypes. Triplets with much third order linkage disequilibrium tend to show a larger number of haplotypes. Additionally, they tend to have a lower likelihood ratio test. More research is needed to evaluate the usefulness of this parameter.
It has been proposed that the difference between normalized pairwise disequilibria estimated from analyses of third and second order disequilibria should shed light on hitchhiking selection in a method called “constrained disequilibrium values” [22, 23]. Robison et al., 1991 showed that a third locus imposes further bounds on second order linkage disequilibria coefficients. Differences in pairwise disequilibria, normalized in the two different ways, highlight the influence that a third locus may exert on the pairwise measure. These authors proposed that differences in the normalized measures could indicate which of the three loci has the selected mutant. This method could be applied to populations typed with SNP arrays. The problem is that, generally speaking, all typed SNPs are neutral and none of them could be assigned as a locus under selection. Also, the method of Robinson et al. [22] requires normalization of pairwise linkage disequilibrium with and without the constraint of a third locus. As discussed above, normalization of pairwise linkage disequilibrium may be also problematic for biallelic loci when constrained by a third locus.
If interested in testing the third order linkage disequilibrium at specific locations, one would carry out hypothesis testing by means of a likelihood ratio test within the maximum likelihood framework. This is not straightforward because the expected haplotype frequencies in the reduced model (M7 in Long et al. [9]) are not estimated directly using the maximum likelihood approach. Long et al. [9] suggested using an iterative proportional fitting approach [24] but this is not optimal in the sense that haplotype frequencies are forced to be between 0 and 1. This situation is much aggravated by the fact that in many cases the third order linkage disequilibrium parameter can just take one possible value when constrained by second order disequilibria and/or allele frequencies. Consequently, testing at specific locations whether the third order linkage disequilibrium is different from zero is challenging.
Third order linkage disequilibrium can be generated by population admixture. It follows a similar pattern as second order linkage disequilibrium. Needing, in addition to a difference in allele frequencies of the two populations at crossing, differences in their pairwise linkage disequilibrium parameters. The observed levels of third order linkage disequilibrium in the analyzed Torbiscal line led us to conclude that this disequilibrium might have been generated after the crossing of multiple strains of Iberian pigs some 50 years ago. Alternatively, genetic drift may have also had a role given the small population size of this strain. More research using other animal, plant or human populations with a different population history may help to understand if the levels of third order linkage disequilibrium are high, when relating to the crossing history of the populations in question.
Once population admixture has generated third order linkage disequilibrium, the three-locus disequilibrium declines exponentially over time by recombination [14]. However, it can persist for long periods of time if the distances are small and recombination infrequent. Our calculations for a recombination fraction of 0.001 (~ 100 kb) between each two consecutive SNPs in a triplet would allow the maintenance of third order linkage disequilibrium for long periods of time. The analyses of the Torbiscal data revealed an average of 92.6 kb for consecutive SNPs, and therefore, could explain well the observed levels of third order linkage disequilibrium in this strain.
The use of composite lines in pig breeding schemes is becoming quite popular for both sire and dam lines [25]. The levels of third order linkage disequilibria in these composite populations may be similar to those observed in our study or higher in the dam lines coming from Chinese-European origins, with larger differences in allele frequencies. More research is needed to understand its implications for gene mapping and/or as an aid to trace haplotypes to ancestor’s origins.
Conclusions
The main conclusions of this paper are: a) the existence of third order linkage disequilibrium is substantial in a composite Iberian pig line, b) third order linkage disequilibrium in this strain might have been generated by admixture of four strains of Iberian pigs, c) the absolute value of the third order linkage disequilibrium decreases rapidly with a physical distance above 4 Mb, d) the number of haplotypes is much reduced for linked loci due to the mutual constraints of pairwise and third order linkage disequilibrium parameters, and e) normalization of third order disequilibria is not advised for closely linked biallelic loci. High order linkage disequilibrium might shed light on our understanding of the complex metabolic pathways in which multiple loci are involved. Much of the actual variation in quantitative traits might go unnoticed when analyzing just one or two loci at a time.
Declarations
Acknowledgments
We are thankful to reviewer #1 for her/his contributions to the manuscript. Financial support was provided by AGL2016-75942-R and ERA-NET SUSPIG projects. We acknowledge the effort of the late Jaime Rodrigáñez and all the staff at the Iberian pig farm ‘Dehesón del Encinar’ for maintaining the Torbiscal pigs and their ancestors since 1944.
Funding
L. Gomez-Raya, W. Rauw and L. Garcia-Cortes got financial support from AGL2016–75942-R and ERA-NET SUSPIG projects. L. Silio and MC Valdovinos got financial support from ERA-NET SUSPIG project. Genotyping was financed by AGL2016–75942-R project.
Availability of data and materials
The original genotype files generated and/or analyzed during the current study are not publicly available because they do not belong to the authors. However, the counts for the triplet genotypes used to estimate third order linkage disequilibrium are available from the corresponding author upon request. Software developed in this study is available to readers upon request.
Authors’ contributions
LS and CR conceived and carried out the genotyping experiments, and contributed with the writing of the manuscript; LGR wrote the algorithm for the estimation of third order linkage disequilibrium, analyzed the genotypic data and wrote the first version of the manuscript; WR and LGC contributed to the analyses of the genotypic data and to the witting of the manuscript; All authors read and approved the final manuscript.
Ethics approval and consent to participate
The current study was carried out using blood samples stored from a conservation herd of Iberian pigs located in the CIA Dehesón del Encinar (Toledo, Spain). The blood samples were obtained under a Project License from the INIA Scientific Ethic Committee. Animal manipulations were performed according to the Spanish Policy for Animal Protection RD1201/05, which meets the European Union Directive 86/609 on protection of animals used in experimentation.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
- Smith JM, Haigh J. The hitch-hiking effect of a favourable gene. Genet Res. 2007;89:391–403.View ArticlePubMedGoogle Scholar
- Slatkin M. Linkage disequilibrium--understanding the evolutionary past and mapping the medical future. Nat Rev Genet. 2008;9:477–85.View ArticlePubMedPubMed CentralGoogle Scholar
- Risch N, Merikangas K. The future of genetic studies of complex human diseases. Science. 1996;273:1516–7.View ArticlePubMedGoogle Scholar
- Meuwissen TH, Hayes BJ, Goddard ME. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001;157:1819–29.PubMedPubMed CentralGoogle Scholar
- Geiringer H. On the probability theory of linkage in Mendelian heredity. Ann Math Stat. 1944;15:33.View ArticleGoogle Scholar
- Bennett JH. On the theory of random mating. Ann. Eugenics. 1954;31:1–317. 18, 18 7Google Scholar
- Thomson G, Baur MP. Third order linkage disequilibrium. Tissue Antigens. 1984;24:250–5.View ArticlePubMedGoogle Scholar
- Excoffier L, Slatkin M. Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol Biol Evol. 1995;12:7.View ArticleGoogle Scholar
- Long JC, Williams RC, Urbanek M. An E-M algorithm and testing strategy for multiple-locus haplotypes. Am J Hum Genet. 1995;56:799–810.PubMedPubMed CentralGoogle Scholar
- Robinson WP, Asmussen MA, Thomson G. Three-locus systems impose additional constraints on pairwise disequilibria. Genetics. 1991;129:925–30.PubMedPubMed CentralGoogle Scholar
- Kim Y, Feng S, Zeng ZB. Measuring and partitioning the high-order linkage disequilibrium by multiple order Markov chains. Genet Epidemiol. 2008;32:301–12.View ArticlePubMedGoogle Scholar
- Berg A, He Q, Shen Y, Chen Y, Huang M, Wu R. Trilocus disequilibrium analysis of multiallelic markers in outcrossing populations. Stat Appl Genet Mol Biol. 2010;9:16. ArticleView ArticleGoogle Scholar
- Chakraborty R, Smouse PE. Recombination of haplotypes leads to biased estimates of admixture proportions in human populations. Proc Natl Acad Sci U S A. 1988;85:3071–4.View ArticlePubMedPubMed CentralGoogle Scholar
- Hill WG. Disequilibrium among several linked neutral genes in finite population 1. mean changes in disequilibrium. Theor Popul Biol. 1974;5:366–92.View ArticlePubMedGoogle Scholar
- Hill WG. Disequilibrium among several linked neutral genes in finite population. II. Variances and covariances of disequilibria. Theor Popul Biol. 1974;6:184–98.View ArticlePubMedGoogle Scholar
- Lewontin RC. The Interaction of Selection and Linkage. I. General Considerations; Heterotic Models. Genetics. 1964;49:49–67.PubMedPubMed CentralGoogle Scholar
- Silio L, Barragan C, Fernandez AI, Garcia-Casco J, Rodriguez MC. Assessing effective population size, coancestry and inbreeding effects on litter size using the pedigree and SNP data in closed lines of the Iberian pig breed. J Anim Breed Genet. 2016;133:145–54.View ArticlePubMedGoogle Scholar
- Ramos AM, Crooijmans RP, Affara NA, Amaral AJ, Archibald AL, Beever JE, Bendixen C, Churcher C, Clark R, Dehais P, Hansen MS, Hedegaard J, Hu ZL, Kerstens HH, Law AS, Megens HJ, Milan D, Nonneman DJ, Rohrer GA, Rothschild MF, Smith TP, Schnabel RD, Van Tassell CP, Taylor JF, Wiedmann RT, Schook LB, Groenen MA. Design of a high density SNP genotyping assay in the pig using SNPs identified and characterized by next generation sequencing technology. PLoS One. 2009;4:e6524.View ArticlePubMedPubMed CentralGoogle Scholar
- Phillips PC. Epistasis--the essential role of gene interactions in the structure and evolution of genetic systems. Nat Rev Genet. 2008;9:855–67.View ArticlePubMedPubMed CentralGoogle Scholar
- Stich B, Yu J, Melchinger AE, Piepho HP, Utz HF, Maurer HP, Buckler ES. Power to detect higher-order epistatic interactions in a metabolic pathway using a new mapping strategy. Genetics. 2007;176:563–70.View ArticlePubMedPubMed CentralGoogle Scholar
- Hill WG. Tests for association of gene frequencies at several loci in random mating diploid populations. Biometrics. 1975;31:881–8.View ArticlePubMedGoogle Scholar
- Robinson WP, Cambon-Thomsen A, Borot N, Klitz W, Thomson G. Selection, hitchhiking and disequilibrium analysis at three linked loci with application to HLA data. Genetics. 1991;129:931–48.PubMedPubMed CentralGoogle Scholar
- Grote MN, Klitz W, Thomson G. Constrained disequilibrium values and hitchhiking in a three-locus system. Genetics. 1998;150:1295–307.PubMedPubMed CentralGoogle Scholar
- Deming WE, Stephan FF. On least square adjustment of sampled frequency tables when the expected marginal totals are know. Ann Math Stat. 1954;6:18.Google Scholar
- Boitard S, Chevalet C, Mercat MJ, Meriaux JC, Sanchez A, Tibau J, Sancristobal M. Genetic variability, structure and assignment of Spanish and French pig populations based on a large sampling. Anim Genet. 2010;41:608–18.View ArticlePubMedGoogle Scholar