The SSC6 region associated with androstenone level was reduced from 3.75 Mbp to 1.94 Mbp and the association of haplotypes in the region with androstenone was replicated in independent populations. Haplotype 1 reduces the androstenone level across populations and can be potentially implemented in marker-assisted selection by pig breeding companies. Selection for haplotype 1 would speed up the genetic response for lower androstenone level, which would reduce the incidence of boar taint, countering the effects of international policies regarding castration of piglets. The association of SULT2A1 expression in the testis with the level of androstenone [13–15] was confirmed by sequence analysis of RNA pools. Validation of differential expression showed that a SNP located within exon 2 of SULT2A1 presented higher expression of the C over the T allele, confirming the result from the RNA-seq analysis and suggesting allelic-imbalanced expression of the two alleles. This difference in the ratio of C:T is however not associated with the haplotypes. A thorough search for functional SNP variation was carried out and resulted in a limited number of non-synonymous variants, despite the very high density of genes in the region.
Region of interest and haplotypes
The number of SNPs within the region of interest is higher in our study compared to Duijvesteijn et al. , due to the improved assembly of the reference genome Sscrofa10.2. Non-associated SNPs that were previously located within the associated region were moved elsewhere on the genome; simultaneously, additional non-associated SNPs were now included in the associated region.
Predicted haplotypes varied in number and frequency among the 11 populations. A greater number of haplotypes were found in those populations that represent crosses of purebred populations (populations 7 to 11). This was expected as crosses are made between divergent purebred populations that have different frequencies of haplotypes. Crosses will therefore combine haplotypes present in the purebred populations.
Effects and association analysis of haplotypes
Across all populations, haplotype 1 was consistently related with lower levels, and haplotype 2 with higher levels of androstenone. The haplotype tree showed two very distinct groups of haplotypes. When this tree was used to detect associations between (groups of) haplotypes and phenotypes, the estimated effects from regression analyses were in good agreement with the evolutionary history of the haplotypes. Haplotypes similar in sequence to haplotype 1 also have similar effects, decreasing androstenone, and haplotypes similar to haplotype 2 have effects that increase androstenone.
After confirming that in general, haplotypes similar to 1 are associated with low and haplotypes similar to 2 are associated with high androstenone level, a posterior analysis using these two haplotypes together with the recombinant haplotype 7, placed haplotype 7 in the high-androstenone group. This placement was important because the haplotype 7 sequence is a recombination between haplotypes 1 and 2. From this result it was possible to deduce that the region from SNP 1 to 13 harbors the genetic variation responsible for the QTL for androstenone level in boars. Because it is unknown where the recombination took place the region was defined including the flanking intervals, 3′ up to SNP 14, and 5′ up to the next SNP outside the LD block (SSC6: 48,317,509 bp – 50,259,057 bp, between genes SAE1 and SLC17A7). The assignment of haplotype 7 allowed us to narrow down the associated region from 3.75 Mbp to 1.94 Mbp.
This region is very gene-dense and contains several candidate genes for androstenone-level QTL [9, 13]: SULT2A1, SULT2B1, HSD17B14, LHB, and FTL. The region is only ~0.3 cM long and has a low recombination rate  (Additional file 8). This is consistent with the low number of haplotypes identified within this region, even when using multiple populations. Across all 11 populations the same small set of haplotypes was found with consistently replicated effects of the haplotypes on androstenone, making the results very robust and useful for breeding programs selecting animals with reduced androstenone level.
From the six genes that were differentially expressed in the liver and testis, SULT2A1 is an obvious candidate gene as it is involved in the metabolism of steroids. This gene is a sulfotransferase enzyme which sulfoconjugates α-androstenone. Increased expression of SULT2A1 in the testis was found in the pool of animals with high-androstenone haplotype 2 (Table 1).
The higher level of SULT2A1 in the testis was associated with higher androstenone level in fat tissue. This was unexpected based on the predictions by Sinclair & Squires  that animals with low ability to sulfoconjugate 5α-androstenone in the testis would have higher accumulation of this hormone in fat tissue. Nevertheless, three other studies on different breeds (Duroc, Norwegian Landrace, and Yorkshire) [13–15] are in accordance with our results, showing up-regulation of SULT2A1 in the testis of high-androstenone animals. Androstenone is known to be sulfoconjugated in the testis , presumably to facilitate excretion and subsequent transport as androstenonesulfate in the blood. As suggested by Moe et al. , high androstenone levels might induce an increase in SULT2A1 expression in the testis. Recent results suggest, however, that SULT2A1 might not be involved in the sulfoconjugation of androstenone and that another sulfotransferase is involved in this step, or that it is involved only in combination with enolase . Moe et al.  also studied gene expression in the liver and found many genes to be differentially expressed but not SULT2A1, similar to our observation for the liver.
Another candidate gene that was differentially expressed in the testis is FTL. The FTL gene codes for the ferritin light chain, an iron storage protein involved in numerous essential cellular functions. Although the function of FTL in the synthesis of androstenone has not been investigated , it was suggested by Moe et al.  that FTL may influence androstenone level by interaction with CYB5A that may affect the CYB5/CYP450 electron transfer. As the role of FTL affecting androstenone has not been investigated in more detail and in our study we did not find any variants that could explain a difference in expression, it remains unclear whether it has a direct effect on androstenone level. It was, therefore, not considered to be a strong candidate gene. Our expression data for FTL is consistent with the findings of three other studies [13–15], where FTL was up-regulated in Duroc, Norwegian Landrace, and Yorkshire boars with high androstenone levels.
Functional analysis using DNA sequence data
The only gene within the 1.94 Mbp region for which a non-synonymous variation was identified between haplotypes 1 and 2 that might have an impact on protein function was FUT1. FUT1 has been identified as a candidate gene controlling the adhesion of enterotoxigenic Escherichia coli (ETEC) F18 to the F18 receptor . However, FUT1 is not known to have an influence on androstenone level, and based on the functions of the protein encoded by this gene, it is unlikely that it affects androstenone level.
We studied the regulatory regions of SULT2A1 and FTL because they were the two candidate genes that were differentially expressed according to the RNA-seq analysis. We checked potential TFBSs and CpG islands, and only one variation (C/G, 49,110,873 bp) was found within a CpG island (49,110,687 bp - 49,110,889 bp) predicted for SULT2A1.
CpG islands are known to play a role in regulating gene expression where, in general, higher methylation levels are related to repression of gene expression . This one variation found within the CpG island could explain the difference in expression of SULT2A1 caused by the haplotype, however, this difference in expression between haplotypes identified by RNA-seq could not be validated subsequently, making it very unlikely that this variation plays a role in gene regulation.
Validation of differential SULT2A1 expression
Allele-specific expression analysis was a follow-up step to the RNA-seq experiment to test the association of the haplotypes with difference in the ratio of C:T expression of SULT2A1 within heterozygous animals. The quantitative difference in the relative expression found for RNA-seq (2.5:1) and allele-specific expression analysis (1.5:1) may simply be due to random error in the estimate from RNA-seq analysis which was based on only two pooled samples. Other reasons include systematic or technical differences that affect the amplification in the RNA-seq assay. There may be other biological mechanisms that trigger a higher expression of SULT2A1 allele C that cannot be captured by allele-specific expression analysis. Unraveling such a mechanism can however not be achieved using our data. Surprisingly, in the allele-specific expression analysis we did not observe differential expression between heterozygous animals (C/T) with either low/low or high/low androstenone diplotypes (Figure 4). We concluded that the difference in SULT2A1 expression was not regulated by the haplotypes surrounding the SULT2A1 gene. Instead, an increase in expression of allele C over allele T in SULT2A1 was observed, indicating haplotype independent allelic-imbalanced expression between these two alleles. One option for the cause of this allelic-imbalanced expression is a potential regulatory SNP-variant in LD with the SULT2A1 SNP that affects expression. Other options are transcriptional regulation of the two alleles, like in an enhancer element, that resides outside the investigated region, or differences in RNA decay between the two alleles. It is known that the RNA folding structures play a role in the degree of RNA decay. Prediction of the fold structure indicated a considerable difference in structure around the two alleles (Madsen, O., unpublished observation) making RNA decay a possible participant in the observed allelic-imbalanced expression.
Origin of the haplotypes
Since the entire region between 48.3 Mbp and 50.2 Mbp on SSC6 has a very low recombination rate , the integrity of the haplotypes found in this study has been retained across different populations. Because of this retained integrity, a phylogenetic analysis could be applied to construct a phylogenetic tree of this region from sequencing data from the 55 sequenced animals (Figure 5) .
This tree revealed that haplotype 1 of the 1.94 Mbp region, associated with low androstenone level, originated from Asia. It is likely, therefore, that haplotype 1 was introgressed into European breeds during the 18th and 19th centuries, generating hybrid European breeds . Introgression of favorable Asian haplotypes has been observed for other traits as well. A well-known example is an IGF2 haplotype conferring increased muscle mass and leaner pigs . This haplotype is currently in high frequency in several commercial pig populations, but originated from Asian pigs. There is currently only a handful of gene variants described from European pigs that originate from the late 18th- early 19th century introgression of Asian breeding stock (e.g. ).
The likely relatively recent (i.e., around 200 years ago or less) introgression of the Asian haplotypes into the European pigs, combined with the very low recombination rate in the genomic region, further explains the paucity of recombinant haplotypes, and difficulty in fine-mapping even across breeds.
Pigs with Asian origin haplotypes were associated with low-androstenone level, whereas European-origin haplotypes were associated with high androstenone level. This is consistent with Lee et al.  who found that Large White alleles have an additive effect on androstenone level for a QTL found on SSC6 at 91 cM, between SW782 (49,996,734 bp-49,996,825 bp) and SW1823 (79,653,393 bp-79,653,597 bp), in an F2 Large White x Meishan population.
Taking into account that haplotypes of European breeds originated from Asian breeds and that Asian breeds have high genetic diversity , further studies are needed either to identify additional haplotypes that are recombinant between European and Asian animals or to fine-map the region further in Asian pigs since LD will be much lower than in European pigs.