Serum bilirubin concentration is modified by UGT1A1 Haplotypes and influences risk of Type-2 diabetes in the Norfolk Island genetic isolate

Background Located in the Pacific Ocean between Australia and New Zealand, the unique population isolate of Norfolk Island has been shown to exhibit increased prevalence of metabolic disorders (type-2 diabetes, cardiovascular disease) compared to mainland Australia. We investigated this well-established genetic isolate, utilising its unique genomic structure to increase the ability to detect related genetic markers. A pedigree-based genome-wide association study of 16 routinely collected blood-based clinical traits in 382 Norfolk Island individuals was performed. Results A striking association peak was located at chromosome 2q37.1 for both total bilirubin and direct bilirubin, with 29 SNPs reaching statistical significance (P < 1.84 × 10−7). Strong linkage disequilibrium was observed across a 200 kb region spanning the UDP-glucuronosyltransferase family, including UGT1A1, an enzyme known to metabolise bilirubin. Given the epidemiological literature suggesting negative association between CVD-risk and serum bilirubin we further explored potential associations using stepwise multivariate regression, revealing significant association between direct bilirubin concentration and type-2 diabetes risk. In the Norfolk Island cohort increased direct bilirubin was associated with a 28 % reduction in type-2 diabetes risk (OR: 0.72, 95 % CI: 0.57-0.91, P = 0.005). When adjusted for genotypic effects the overall model was validated, with the adjusted model predicting a 30 % reduction in type-2 diabetes risk with increasing direct bilirubin concentrations (OR: 0.70, 95 % CI: 0.53-0.89, P = 0.0001). Conclusions In summary, a pedigree-based GWAS of blood-based clinical traits in the Norfolk Island population has identified variants within the UDPGT family directly associated with serum bilirubin levels, which is in turn implicated with reduced risk of developing type-2 diabetes within this population. Electronic supplementary material The online version of this article (doi:10.1186/s12863-015-0291-z) contains supplementary material, which is available to authorized users.


Background
This study examined a large multi-generational pedigree from the isolated population of Norfolk Island to identify genomic variants (SNPs -single nucleotide polymorphisms) associated with routinely collected blood-based clinical traits. The Norfolk Island population is a genetic isolate with strong family groups and a well-documented family genealogy [1]. Norfolk Island is a small volcanic island located in the Pacific Ocean between Australia (about 1600 km north-east of Sydney) and New Zealand (1077 km north-west of Auckland). Alongside geographic isolation, a unique history has shaped the genomic architecture of the current pedigree members resulting in an admixed population with both European and Polynesian ancestry [2]. Recent estimation of the admixture in the Norfolk Island cohort reported 88% European ancestry and 12% Polynesian ancestry [2].
To date the Norfolk Island Health Study (NIHS) has collected data and samples for 1199 Norfolk Islanders, 52% (N=624) of whom were found to have direct links to the original founders. Using this in-depth genealogical information a large multi-generational Norfolk pedigree was reconstructed [1]. Several studies have established admixture scores and presence of founder effects within the Norfolk Island pedigree [1][2][3] and the pedigree has been shown to have sufficient power to detect genetic loci influencing complex traits via linkage and association [4][5][6][7].
The Norfolk Island population has high rates of metabolic syndrome [7] and cardiovascular related risk factor traits, especially obesity, compared to mainland Australia. Research on the Norfolk pedigree has shown that traits for obesity, dyslipidaemia, blood glucose and hypertension exhibit a substantial genetic component, with heritability estimates ranging from 30% for systolic blood pressure (SBP) to 63% for low density lipoproteins (LDL) cholesterol [1,4,5]. In addition, factor analysis identified "composite" phenotypes with high heritability [5], suggesting that common gene(s) underlie cardiovascular disease-related phenotypes. Furthermore, genetic linkage analysis in the Norfolk Island pedigree has successfully identified previously documented regions associated with cardiovascular disease risk traits, the most significant being for SBP on chromosome 1 (1p36) [4].
Reported rates of type-2 diabetes within the Norfolk Island population are similar to mainland Australia (4-8%). However, a significantly higher proportion of individuals had fasting blood glucose in excess of normal ranges (>5 mmol/L), suggesting a high prevalence of pre-diabetes and possible under-diagnosis of type-2 diabetes [4,8]. Additionally, clinical diagnosis of type-2 diabetes using AUSDRISK [9] identified that 42% of the Norfolk Island population were at high-risk of developing the disease [7].
Bilirubin is a component of haemoglobin, formed during metabolic breakdown in the liver. Total serum bilirubin measures both water-soluble (direct-) and fat-soluble (indirect-) bilirubin. Bilirubin is also a potent antioxidant and as such has a vital role in the protection of the body against reactive oxygen species [10][11][12]. Numerous epidemiological analyses have reported strong negative associations between CVD-risk and serum bilirubin levels . Very few studies investigating the link between type-2 diabetes and serum bilirubin concentration have been conducted [13], although recently an association with mortality in a type-2 diabetic cohort was observed [14]. Serum bilirubin concentration has been shown to be tightly regulated by the UDP-glucuronosyltransferase (UDPGT) enzyme family, with several large GWAS and linkage studies identifying variants within UGT1A in particular [15][16][17][18]. This is suggestive of a potentially heritable metabolic disease factor, for which a recent study provides further supportive evidence; a Mendelian randomization study exploring total bilirubin levels in a prospective study found further evidence for a protective role in type-2 diabetes [19].
The aim of this study was to update the previously calculated heritabilities for a range of bloodbased traits relating to CVD risk in the Norfolk Island cohort and to perform genome-wide association studies (GWASs) of the heritable traits using a pedigree-based approach.

Heritability of Individual Metabolic Traits
A description of the blood-based clinical traits investigated in this study, including summary statistics, is shown in Additional File 1. The latest pedigree relationship information and GenABEL were used to calculate heritability (h 2 ) statistics for all traits profiled in the Norfolk Island cohort. In total, 16 traits (out of 19) yielded statistically significant h 2 values ranging from 0.225 -0.563 (nominal P<0.05). The average heritability was 0.39 and 8 traits exhibited a higher than average heritability (total protein, globin, total bilirubin, LDL-C, cholesterol, alkaline phosphatase, and urea) the most heritable trait being total protein (h 2 =0.563, P=2.26x10 -4 ). A summary of all significantly heritable major blood-based clinical traits is shown in Table 1.

GWAS of Metabolic Traits
All 16 heritable blood-based clinical traits were screened for association separately; individual trait GWAS Manhattan plots can be viewed in Additional File 2. There were 2 traits with robustly associated clusters (i.e. SNPs in close proximity to each other); total bilirubin and direct bilirubin. It should be noted that a number of SNPs passed the adjusted significance threshold for liver function traits (i.e. GGT, AST, ADH). These traits exhibited numerous SNPs passing Meff adjustment, however robust 'peaks'/clusters of SNPs were not observed.

Exploration of the Bilirubin Association on Chromosome 2q37.1
The strongest observed association was seen between a cluster of 29 SNPs on chromosome 2q37.1 passing a Meff adjusted threshold and total serum bilirubin (Fig 1 A, Table 2). The most robustly associated SNP was rs6744284 (P=1.87x10 -16 ). A weaker association was observed for the same cluster of SNPs on chromosome 2q37.1 with direct serum bilirubin levels (Fig 1 B). These 29 SNPs span a region of 189.8 kb, and lie directly on top of a complex locus that codes numerous isoforms of the UDP-glucuronosyltransferase (UGT) family (Fig 2).

LD block identification
Evidence of strong linkage disequilibrium (LD) across the 29 SNPs was observed in the Norfolk Island population (Fig 3); summarised LD statistics for the 29 SNPs: r 2 (min = 0.026, 1st Quartile = 0.33, median = 0.49, mean = 0.51, 3rd Quartile = 0.72, max = 1.00), D' (min = 0.24, 1st Quartile = 0.82, median = 0.90, mean = 0.89, 3rd Quartile = 1.00, max = 1.00).. Haploview analysis identified 2 LD blocks across the region; the first block contained 9 SNPs and spanned 88 kb, the second block consisted of 19 SNPs and spanned a region of 74 kb. Further analysis of LD across 3 separate HapMap populations was conducted to compare with that obtained in the Norfolk Island cohort; CEU (European), CHD (Chinese) and JPT (Japanese). Due to the use of different SNP arrays, 25 of the 29 SNPs were available across the 4 populations, thus the LD mapping was restricted to these 25 SNPs.
The LD pattern for the Norfolk Island cohort was most similar to the CEU population, and extensively different from both of the Asian HapMap groups used (Additional File 3). LD appeared slightly stronger in the Norfolk Island SNPs than for CEU. Allele frequencies for the 25 SNPs in these 4 populations are detailed in Additional File 4.

Haplotype mapping and association with bilirubin levels
Haploview association analysis was performed on the individual 29 SNP 'markers', minor allele frequencies (MAF) and association statistics are documented in Table 3 (for additional information see Additional File 5). All 29 SNPs exhibited significantly (P<1.0x10 -4 ) increased MAF in the high serum bilirubin group. The most significantly associated marker was rs17863787; the frequency of the 'G' allele was observed to be 62.3% in those with high serum bilirubin and 24.9% in those with normal serum bilirubin (P=5.51x10 -17 ).
To further investigate the association of genomic structure across the chr2q37.1 region with serum bilirubin, a haplotype association analysis was conducted in Haploview. There were a total of 6 haplotypes inferred for LD block 1 and 7 haplotypes for LD block 2 (Additional file 6); haplotypes present in >1% of the total population are shown. The block 1 haplotype most significantly associated with the high bilirubin group was 'TAAGTGGGA', which is estimated to exist at 20.3% in the total population. This haplotype was observed in 40.3% of the high serum bilirubin group, and 17.2% of the normal group (P=4.59x10 -9 ). The most abundant block 1 haplotype ('CGGTCCACT', 33.6% of total population) was observed to be significantly associated with the normal serum bilirubin group; 36.9% normal vs 19% high (P=9.31x10 -5 ). The LD block 2 haplotype most significantly associated with high serum bilirubin was 'GGGCGTTGTGAGCTTGTTC'; which is estimated to be present in 18.8% of the total population. This haplotype was observed in 43.5% of the high serum bilirubin group, and 14.3% of the normal group (P=1.73x10 -14 ). The most abundant block 2 haplotype ('CAAATCCACTGTACGTCCT', 49.2% of total population) was observed to be significantly associated with the normal serum bilirubin group; 54.6% normal vs 26.1% high (P=3.51x10 -9 ). Frequency and combination of the block specific haplotypes is illustrated in Fig 4. Nine tagging SNPs were identified that capture the allelic variance of the 29 SNPs (Table 4); the tagging analysis captured all 29 alleles at r 2 >= 0.8 which contains 100 percent of alleles with mean r 2 of 0.963. These SNPs could be used in future replication analyses to tag variation across the region in other populations.

Bilirubin Correlations with clinical metabolic syndrome and cardiovascular disease
It is well established that serum bilirubin levels are inversely correlated with risk of developing cardiovascular disease [20][21][22]. Therefore this was investigated using the cardiovascular disease risk score previously calculated for the Norfolk Island population [7], along with potential relationships between other metabolic risk scores, including metabolic syndrome and type-2 diabetes (scores previously estimated [7] Numerous studies have also attributed smoking behaviour to be associated with serum bilirubin levels [23][24][25]. This was tested in the Norfolk Island population using the students independent ttest, and revealed a significant difference in mean serum bilirubin levels between smokers (6.46 µmol/L) and non-smokers (8.12 µmol/L); t=3.99 with P=4.06x10 -5 .
To further examine potential relationships a series of t-tests between a variety of quantitative metabolic syndrome/cardiovascular disease traits and categorised serum bilirubin group were performed. There were a total of 9 significant (P<0.05) trait correlations with categorised bilirubin level, these were; body mass index (BMI), body fat, cholesterol/HDL-C ratio, total cholesterol, hip circumference, LDL-C, type-2 diabetes risk score, total protein and triglycerides (Table 5). These findings highlight traits that are consistent with previous literature [26,27].
Body fat was observed to have the strongest correlation with serum bilirubin, with significantly reduced body fat composition in individuals who had high serum bilirubin levels. Unlike previous observations [20,27,28], cardiovascular disease risk score was not significantly reduced in those individuals with higher serum bilirubin, whereas, type-2 diabetes risk did show a significant reduction in the higher bilirubin group, consistent with previous literature [26,29].

Genotype effects on metabolic syndrome, type-2 diabetes and cardiovascular disease traits
To further explore the above approach, associations between the 29 significantly associated SNPs and metabolic traits other than serum bilirubin were explored. Traits which showed a significant (P<0.05) correlation with total serum bilirubin ( Table 5) were selected. Only one trait was observed which showed a significant association with any of the 29 markers, this was type-2 diabetes-risk when categorised: "low"; "intermediate", and "high" [9]. Using a chi-squared test rs2741012 and rs2741027 were significantly associated with type-2 diabetes-risk (χ2=9.63, P=0.0069). Again this was followed with a Fisher's Exact test which confirmed significance (P=0.0081). The same observation with the minor allele and suggestive protection was observed.
Therefore, inclusion of SNP genotypes when assessing the relationship between direct bilirubin and type-2 diabetes risk increases the accuracy of the 'risk' estimate within the Norfolk Island cohort.

Discussion
We have identified a significant genomic association at 2q37.1 in the region of the UDPglucuronosyltransferase (UDPGT) enzyme family members, with direct and total serum bilirubin levels. Correlation analyses between metabolic syndrome related traits and serum bilirubin levels identified significant inverse relationships for numerous traits. Haplotype association testing revealed the presence of potentially protective haplotypes within the Norfolk Island population.
Thus this study has identified a complex region which shows interplay between genomic and environmental conditions and has a large effect on overall serum bilirubin levels.
Previous literature has suggested a linkage between bilirubin and metabolic risk with clinical associations observed between cardiovascular disease risk, obesity and bilirubin concentrations [20][21][22]27] and more recently metabolic syndrome [30][31][32][33][34]. Therefore, we investigated potential relationships between bilirubin and metabolic traits in the Norfolk Island cohort. An inverse correlation between serum bilirubin and several important metabolic traits was observed, with the most notable being metabolic syndrome and type-2 diabetes risk. Given that metabolic syndrome and type-2 diabetes increase cardiovascular disease risk it is consistent with the current body of literature which documents inverse association between high serum bilirubin and cardiovascular disease risk (review [26]).
Our analysis refined an association with serum bilirubin concentration to a 189.8 kb region on chromosome 2q37.1 with genotypic analyses revealing that the level of serum bilirubin was greatly increased in individuals with the rare allele. This region encodes one of the major drug metabolising families (UDP-glucuronosyltransferase, UDPGT) [35][36][37]; there are 9 documented UDPGT isoforms; UGT1A1, UGT1A3, UGT1A4, UGT1A5, UGT1A6, UGT1A7, UGT1A8, UGT1A9 and UGT1A10 (Fig 2). UGT1A1 is well known to preferentially metabolise bilirubin and has been previously mapped in linkage and GWAS studies [16][17][18][38][39][40][41][42][43]. UGT1A3 and UGT1A4 also have been shown to have potential action with bilirubin [37]. However all family members, including UGT1A1, exhibit activity for numerous substrates and it is therefore possible that the gene effects are not mediated (entirely) by total bilirubin. Such pleotropic effects at this loci are likely to be the case as evidenced by the fact that adjustment for serum bilirubin in our modelling did not completely nullify the association between genotype and outcome. Future work is required to explore the effects and associations of other substrates with this genomic region.
Mutations in UGT1A1 have also been associated with Crigler-Najjar syndromes types I and II and in Gilbert syndrome [44][45][46]. Gilbert Syndrome (GS) is a well-documented benign increase in serum bilirubin, and is caused by the reduced activity of UDPGT [47][48][49][50][51]. In line with the observations that serum bilirubin is inversely correlated with metabolic risk diabetic patients with GS are less likely to develop vascular dysfunctions [52]. Furthermore, the incidence of diabetes and cardiovascular disease risk mortality is lower in GS individuals, with one study exploring the efficacy of increasing serum bilirubin in type-2 diabetic patients [53]. Further evidence confirming the protective role of circulating bilirubin for type-2 diabetes has been reported in a prospective study [19].
Significant difference has been identified between functional polymorphisms within the UGT1A family between Caucasian and other populations [54]. Polymorphisms in the promoter region for UGT1A1 (2 bp TA insertion in the TATA box) increased activity in Caucasian GS patients; this was not observed in Asian and African GS patients or Pacific populations [54]. The authors suggest that due to the complex nature of environmental and genetic factors, unstable polymorphisms within UGT1A1 may act to "fine-tune" plasma bilirubin levels on a population by population basis, meaning that the promoter variation explains the presence of GS in some populations, but in other populations it's more likely a combination of variants in the encoding region along with environmental factors [54], our data supports this hypothesis. Additionally, meta-analysis has demonstrated strong replication for a genetic influence on serum bilirubin levels of the UGT1A1 locus (P<5x10 -324 ), specifically at the proximal promoter region of UGT1A1 tagged by rs6742078 [40].
While we didn't have genotype information for this SNP we were able to impute against the 1000 Genomes panel to extrapolate associations between the two studies. Using imputed information we were able to illustrate that there is tight LD between rs6742078 and the top associated SNP from our study, rs6744284 (r 2 =0.85), suggesting that the Norfolk Island cohort exhibit a similar genetic pattern of association.
We identified strong LD across the region of 2q37.1, potentially suggesting that the Norfolk Island population's unique genomic structure is influencing serum bilirubin concentration. LD across the same region in data available through the HapMap project [55] showed that the Norfolk Island cohort exhibited an LD pattern similar to that observed in the European population (CEU), while both the Asian populations (Chinese and Japanese) exhibited very different genetic structure across this region. This is not unexpected because of the large amount of recent European admixture in the Norfolk population. Additionally, it was noted that haplotypes containing the minor allele(s) in the Norfolk Island population potentially conferred protection to metabolic disorders as measured by clinical metabolic syndrome and type-2 diabetes-risk. It is possible that selection is driving the presence of high serum bilirubin within populations, although this may be achieved by different variation across the region. It appears that in Europeans this variation is often in the promoter region, whereas in Asian and African populations this is not the case, and it is polymorphisms in the gene body that seems to account for the associations with increased bilirubin. This strongly suggests that it is beneficial for a population to have a certain frequency of individuals with naturally high serum bilirubin, and potentially points to a complex interaction between environmental and genomic factors maintaining this.
One significant association between 2 SNPs (rs2741012 and rs2741027) and categorised type-2 diabetes-risk was observed. These two SNPs are just upstream of the promoter and 5'UTR region of the UDPGT family. It is likely that these SNPs are in LD with untyped polymorphisms (SNPs not on the 610quad chip) that reside in these regions and potentially form a LD block/haplotype in the Norfolk Island population which confers protection to type-2 diabetes as well as metabolic syndrome. Interestingly, and in support of our approach, this reduction in risk correlates well with previous work conducted in a large US cohort [13]; these variants (or variants tagged by them) may be functional, i.e. they might directly affect transcription and/or translation of the isoforms encoded by the UDPGT family. It is also possible that there are additional rare variants within the region that further influence serum bilirubin as recently evidenced by an exome sequencing study performed in elderly individuals [56].
Given that bilirubin is a cheap and commonly measured laboratory test, routine screening of serum bilirubin levels could be beneficial in the stratification and treatment of metabolic disorders such as cardiovascular disease and type-2 diabetes. Identification of genes/variants that exhibit pleiotropic effects (effects of the same variant on multiple characteristics or disease risks) is an ultimate goal.
The significant interaction observed here provides evidence that bilirubin may be affected by genetic and environmental factors and their interactions.

Conclusions
In summary, this study identified strong associations of variants within the UGT1A family with regulation of serum bilirubin levels in the Norfolk Island population, which replicated previous GWAS and epidemiological findings. This successful implementation of pedigree-based analysis using the unique properties of the Norfolk Island cohort highlights a functional region that offers protective benefit from metabolic disease and further eludes to a potentially heritable component with the Norfolk Island population. Specific haplotype structure was significantly associated with increased serum bilirubin, and as such this study has identified a potential set of 'protective' haplotypes that exist within the Norfolk Island population. Further studies are warranted to validate these findings, with the next step being to explore these associations in larger outbred populations.

Sample/cohort collection, Pedigree Information and Ethics
The Norfolk Island Health Study (NIHS) is well established with regards to data collection and initial disease prevalence studies [4,5,8]. The Norfolk Island pedigree structure has been previously outlined [57], and subsequently updated [1]. The most recent update led to the reconstruction of a core-pedigree consisting of 1388 members coalescing over 11 generations (or 200 years) back to the original founders. [3,7]. This study focuses on a reduced core-pedigree, meaning that individuals; a) are genetically related to the original founders, and b) have phenotype and genotype information available. The total number of individuals fitting these criteria was 382. All individuals gave written informed consent. Ethical approval was granted prior to the commencement of the study by the Griffith University Human Research Ethics Committee (ethical approval no: 1300000485) and the project was carried out in accordance with the relevant guidelines, which complied with the Helsinki Declaration for human research.
After this initial quality control, 590,603 SNPs were exported from PLINK and imported into the CRAN package GenABEL [58]. Further filtering (including Mendelian inheritance violations and sexchecking based on available X and Y markers) in GenABEL lead to the reduction of the SNP set to a total of ~480,000; this included removal of both X and Y chromosome SNPs after gender checking, as well as the removal of mitochondrial and XY SNPs.

Genome-Wide Association Analysis
A pedigree based GWAS analysis of all heritable traits was batched using custom R scripts and the package GenABEL [58]. GenABEL uses an additive approach and the loci are coded as 0, 1, 2 (corresponding to genotypes AA, AB, and BB, respectively). A detailed explanation of the association model and specific GWAS overview as implemented in the Norfolk Island was previously described [7]. Breifly, a correction was made for the relatedness inherent in the Norfolk Island population using the polygenic model with age and sex interactions, as well as genetic structure [the top 2 genomic principal components of the complete SNP set as calculated by KING [61]]. The top two components were chosen as covariates because we found that these explained the majority of the variance in the outcomes being tested and because inclusion of additional, less informative components only served to reduce the parsimony of the models. For association analysis the mmscore function implemented in GenABEL was used. This function represents a mixed model approximation analysis for association between a trait and genetic polymorphism(s), and is specifically designed for association testing in samples of related individuals. This allows for per SNP association testing using a mixed model polygenic approach. After correcting for multiple testing, the study-wide significance was set based on Meff adjustment (P = 1.84x10 −7 ). It should be noted that this Meff threshold is tailored to trait-wise associations, not multi-trait analyses therefore pvalues are adjusted on a per trait basis. Association statistics for every SNP for each trait were generated and output to compressed files (.gz.tar) for storage and future reference. GWAS Manhattan plots where generated for each trait association using a custom modified version of the GenABEL plot.scan.gwaa function (for all Manhattan plots see Additional File 2). Annotation of the robustly associated bilirubin SNPs identified as being functional was performed using: http://brainarray.mbni.med.umich.edu/Brainarray/Database/SearchSNP/snpfunc.aspx.

LD testing and Haplotype Association
Genotype data for the chrq37.12 region was phased using SHAPEIT2 [62], which has functionality to deal with complex pedigree structures -implemented through the duoHMM algorithm. From this process we observed no Mendelian errors before moving the phased data over to Haploview analyses. Haplotype/LD testing, SNP tagging and association analyses were all conducted in Haploview 4.2 [63]. LD blocks were determined using the default Haploview settings which infer LD based on a pairwise comparison of correlation (r 2 ) values between SNPs. Haplotypes were inferred from the genotypes of SNPs which made up the identified LD blocks, and were only recorded if they existed in more than 1% of the population. Tagging SNPs were determined using the 'tagger' option of Haploview, using a pair-wise tagging method with a minimum observed r 2 between pairs of 0.8.
Association analyses were carried out on both markers (SNPs) and haplotypes using the inbuilt Haploview association function. A phenotype column was added to the dataset to allow a 'case'/'control' experimental set-up; where case represented the high bilirubin group and control the normal bilirubin group. There were a total of 65 cases and 317 controls with 124 genotyped individuals missing phenotype information. Permutation testing was run to confirm the above association analyses for both marker and haplotype associations. To ensure the robustness of final P values the number of permutations was set at 1,000,000 (this should lead to a reduction of the

Correlations with metabolic traits
Initial exploratory correlations between risk scores for cardiovascular disease and type-2 diabetes, clinically defined Metabolic Syndrome (categorical: 0 (no MetS), 1 (MetS)), and various related traits were conducted in R 2.15.2 [64]. For all analyses total serum bilirubin levels were categorised into 'normal' and 'high' groupings, with 'high' being defined as >14 µmol/L, this approximates a clinical cut-off and allows facilitates interpretation in line with existing clinical guidlines. For all other traits tested a standard student's t-test (as implemented in R) was used to test for a significant difference of means between the given trait and bilirubin level. There were two categorical traits tested for correlation with serum bilirubin levels; smoking and presence of metabolic syndrome. Smoking has been previously well documented to be associated with serum bilirubin levels [23], and was categorised in the Norfolk Island cohort as either 'yes' (smokers N=133) or 'no' (non-smokers N=458). Correlation testing between smoking and bilirubin was carried out using a 2x2 chi-squared contingency test, followed by a Fisher's Exact test. For correlation analysis between total serum bilirubin and metabolic syndrome there were a total of 598 individuals with available matched phenotype data; 'metabolic syndrome' (N=156) and 'no metabolic syndrome' (N=442). The clinical diagnosis of metabolic syndrome previously calculated for the Norfolk Island cohort was used [7]. A 2x2 chi-squared contingency test was used to evaluate the significance, followed by a Fisher's Exact test as one of the tables cells contained a value less than 5%. Due to the initial exploratory nature of these analyses all tests are unadjusted, so nominal p-values are reported. Additionally, relatedness within the population is accounted for in later formal modelling using GLM regression.

Regression Modelling testing association between outcome, trait and genotype
To further explore associations between bilirubin, type-2 diabetes and the genotypic architecture across the UGT1A1 region regression modelling was conducted in R. To establish an initial association, separate logistic regression was conducted between categorised type-2 diabetes risk and total bilirubin and then direct bilirubin. Additionally a bi-directional stepwise logistic regression model was used to test the significance of each of the 9 tagging SNPs identified in the LD block analysis. The model was not corrected for common covariates (age, sex, smoking, BMI) as these are all accounted for in the calculation of the AUSDRISK type-2 diabetes risk score (as previously calculated in the Norfolk Island cohort [7]). To address the issue of relatedness we included the average pedigree kinship as a covariate in the stepwise regression model. This was excluded from the final model indicating that in this instance relatedness is not a significant issue. Reported r 2 values use the Nagelkerke Index pseudo r 2 as calculated in R. Model p values were generated from an ANOVA using the F distribution, which tests the null hypothesis that the coefficients represented in the overall regression model (represented by R 2 ) are equal to 0.

Competing interests
The authors declare that they have no competing interests.

Authors' contributions
CB, MC, and JC carried out the genotype assays. MH curated phenotype and pedigree data. MB, RL, and DE participated in the design of the study and performed the statistical analysis. JB provided detailed statistical expertise and critical evaluation of methodology. MB, RL and LG conceived of the study, and participated in its design. DM and GC helped to draft the manuscript. All authors read and approved the final manuscript.