Discovery, linkage disequilibrium and association analyses of polymorphisms of the immune complement inhibitor, decay-accelerating factor gene (DAF/CD55) in type 1 diabetes

Background Type 1 diabetes (T1D) is a common autoimmune disease resulting from T-cell mediated destruction of pancreatic beta cells. Decay accelerating factor (DAF, CD55), a glycosylphosphatidylinositol-anchored membrane protein, is a candidate for autoimmune disease susceptibility based on its role in restricting complement activation and evidence that DAF expression modulates the phenotype of mice models for autoimmune disease. In this study, we adopt a linkage disequilibrium (LD) mapping approach to test for an association between the DAF gene and T1D. Results Initially, we used HapMap II genotype data to examine LD across the DAF region. Additional resequencing was required, identifying 16 novel polymorphisms. Combining both datasets, a LD mapping approach was adopted to test for association with T1D. Seven tag SNPs were selected and genotyped in case-control (3,523 cases and 3,817 controls) and family (725 families) collections. Conclusion We obtained no evidence of association between T1D and the DAF region in two independent collections. In addition, we assessed the impact of using only HapMap II genotypes for the selection of tag SNPs and, based on this study, found that HapMap II genotypes may require additional SNP discovery for comprehensive LD mapping of some genes in common disease.


Background
T1D is characterised as a common autoimmune disease, mainly resulting from a T-cell mediated destruction of pancreatic beta cells that leaves patients completely dependent on exogenous insulin to regulate their blood glucose level. T1D is strongly clustered in families with an LD map for human DAF region on Chromosome 1q32 c. D' < 1 D' = 1 LOD < 2 white blue LOD 2 shades of pink/red bright red overall genetic risk ratio, an estimate of the familial clustering of the disease, of approximately 15 [1]. However, of the hundreds of association studies reported to date, only four loci have been identified and successfully replicated: the HLA class II genes on chromosome 6p21 [2]; the insulin gene (INS) on chromosome 11p15 [3,4]; CTLA4 on chromosome 2q33 [5,6]; and PTPN22 on chromosome 1p13 [7,8]. CD25 on chromosome 10p15 has been implicated, but this finding awaits independent replication [9]. Given that these genes alone cannot explain the familial clustering of T1D, many other genes remain to be identified.
Recently, there have been several reports focusing on the relationship between autoimmune disease and the complement system, which is composed of more than 30 soluble and membrane-bound proteins [10,11] and plays an important role in innate host defence. As inappropriate regulation of the complement system can lead to significant damage of host tissues [12], a number of membranebound complement regulatory proteins are active, such as DAF, a glycosylphosphatidylinositol-anchored membrane protein that restricts complement activation by inhibiting the formation of C3 convertases in both the classical and alternative pathways [13,14].
Dysfunction of human DAF on erythrocytes contributes to the paroxysmal nocturnal hemoglobinuria (PNH) by increasing their sensitivity to complement lysis [13,15,16]. In addition, a proportion of DAF-deficient (Cromer INAB) patients develop inflammatory bowel disease. However, little is known about DAFs role in autoimmune disease in vivo [17].
Recently, it has been reported that DAF modulates T cell immunity by controlling T cell-and antigen-presenting cell-induced alternative pathway of C3 activation during cognate interactions [18][19][20]. According to gene targeting studies, mice deficient in the DAF1 gene, the murine homologue of human DAF, showed more susceptibility to complement mediated inflammatory injury, especially DAF1 deficient female mice in a MRL/lpr background, a model for human systemic lupus erythematosus, which showed aggravated lymphadenopathy and splenomegaly, higher serum anti-chromatin autoantibody levels, and dermatitis [21].
Given this prior evidence, DAF may function as a negative regulator of autoimmune response by modulating T cell activity and directly protecting host tissues in vivo and that recombinant DAF may be an ideal therapeutic agent for autoimmunity [22]. On the other hand, DAF does not lie under any of the reported T1D linkage peaks [23,24] nor have there been any reports of genetic association studies between DAF and autoimmune disease, although recently differential expression of DAF was observed when comparing T cells from nonobese diabetic (NOD) mice and diabetes-resistant NOD mice having a congenic interval containing the DAF gene thereby making it a candidate gene for the Idd5.4 region (William Ridgway and Linda Wicker, unpublished observations).
In this study, to elucidate the susceptibility of DAF with T1D, we performed an association study using a LD mapping approach, together with the direct analysis of three non-synonymous SNPs (nsSNPs) in large case-control and family collections.

Linkage disequilibrium analysis
Initially, we used phase II genotyping data from the Hap-Map project [25,26], a catalogue of common human genetic variants, providing their allele frequencies and intermarker LD patterns among people, within and among populations from African, Asian, and European ancestry. In the DAF region, about 40 kb on chromosome 1q32, 21 common SNPs (minor allele frequency (MAF) ≥ 0.05), have been genotyped in 60 U.S.A. residents with northern and western European ancestry, collected in 1980 by the Centre d'Etude du Polymorphisme Humain (CEPH, CEU). We note that all of these SNPs were located in non-coding regions and that the average inter-SNP distance was 2 kb. A LD map of the region, using pairwise D', shows little evidence of recombination within the region (Figure 1a).

DAF resequencing
As we were concerned about adopting a LD mapping approach given the HapMap SNP density [27], we resequenced DAF in 32 CEPH individuals, selected from the 60 CEPH individuals used by the HapMap project, to increase the SNP density across the region.
Analysis of the resequencing data identified 32 polymorphisms, 26 of which were SNPs and six were deletion/ insertion polymorphisms (DIPs), of which 12 SNPs and four DIPs were novel when compared to dbSNP build 125 (Table 1). Twenty-two polymorphisms were common (MAF ≥ 0.05), five of which were also found in the Hap-Map II data. The relatively small number of common polymorphisms found in both datasets is not unexpected, as HapMap II SNPs were selected to provide an even coverage in terms of distance across the genome, whereas the resequencing is focused on regions of interest and extracts all common polymorphisms present in these individuals. A LD map of the region, based on these 22 polymorphisms (Figure 1b), revealed additional evidence of recombination within the region over and above that apparent in HapMap II data alone (Figure 1a). There was

Tag SNP analysis of DAF
To test for an association between T1D and the DAF region, we adopted a LD mapping approach, which exploits the non-random relationships between SNPs (known as LD) in a region of interest to reduce the amount of genotyping required. As the causal SNP is unknown, we assume that predicting the causal SNP is likely to be no more difficult than predicting any other SNP. The predictive performance of the tag SNPs was assessed using a R 2 measure, which measures the ability to predict each known SNP by multiple regression on the set of tag SNPs. The tag SNPs were analysed using a multilocus test, as described by Chapman et al [28], which tests for an association between the tag SNPs and T1D due to LD with one or more causal variants [28,29].
We first combined our resequencing data with the Hap-Map data, providing a panel of 38 common polymorphisms genotyped in 32 individuals (Table 1), and generated a combined LD map of the region (Figure 1c). Subsequently, seven tag SNPs were selected [9] from the 38 common polymorphisms, required to capture the variation within the DAF region with a minimum R 2 of 0.80 [28] ( Table 1). The tag SNPs were genotyped in 3,523 cases and 3,817 controls, and in 725 Caucasian multiplex T1D families ( Table 2). The case-control and family multilocus P-values were 0.12 (3,523 case and 3,817 control genotypes; F 7,7321 = 1.63) and 0.69 (parent-child trio genotypes = 1,390; χ 7 2 = 4.72), respectively, providing no evidence for the association between T1D and the DAF region. In the case-control collection, the multilocus test was stratified by broad geographical within the UK in order to minimize any confounding due to variation in allele frequencies across Great Britain [9,30].

Analysis of DAF non-synonymous SNPs
Recently, it has been proposed that complex diseases such as T1D may result from the effects of a large number of rare variants, with substantial allelic heterogeneity at causal loci [31,32]. In DAF, several rare non-synonymous SNPs (nsSNPs) were reported in the exons encoding the short consensus repeat (SCR) domains of the DAF protein, which have subsequently been shown to be related with antigen of the Cromer blood group system [13,14,33,34]. On the basis of the rare variant hypothesis, we genotyped three rare nsSNPs in3,490 cases and 3,814 controls (Table 3), under the hypothesis that a rare functional variant in DAF might have a strong effect in T1D. The following three nsSNPs were assessed: DAF-WES a/b (G > T) located in exon 2 with a MAF of 0.0055-0.0060 in a Finnish population [35,36]; rs28371588(C > A), also located in exon 2 [34]; and, rs12135160(G > A), identified by SsahaSNP detection tool (NIH and Sanger Institute, UK) in exon 8 and not previously genotyped. All result in amino-acid substitutions, but their phenotypic influences have not been characterized. In the present study, the MAF of rs12135160(G > A) was 0.00042 in 3,768 controls, and consequently, we have no statistical power to detect an association. Both rs28371588(C > A) and DAF-WES a/b (G > T) were monomorphic in the casecontrol collection.

Discussion
In this study, we did not find any evidence for an association between T1D and the DAF region in large case-control and family collections using a LD mapping approach. We combined the HapMap II genotyping data and resequencing data, for the selection of tag SNPs. Had we chosen the tag SNPs using only the HapMap II genotyping data, only two tag SNPs (rs2564978 and rs1507765) were required to capture the detected variation within the ~40 kb DAF region with a minumum R 2 of 0.8. However, when the predictive performance of the two tag SNPs were applied to the combined sequence dataset, they no longer captured the variation within the region to the required level since seven of the thirty-six common polymorphisms had an R 2 below 0.8. The inability of the tag SNPs selected from HapMap II data to tag the combined dataset (minimum R 2 = 0.35) suggests that for the analysis of localized regions containing candidate genes, as opposed to whole-genome association studies, HapMap II data alone may not provide sufficient information to facilitate a comprehensive LD-mapping approach. In the tag SNP approach, as the causal variant is unknown, we assume that the problem of predicting the causal polymorphism is likely to be no more difficult than that of predicting any other polymorphism [28]. Consequently, the power of the tag SNP approach to detect a causal polymorphism is based upon the minimum R 2 [28], assuming that the majority of common polymorphisms in a region are known. In this instance, incomplete knowledge of the common polymorphisms in a region inflated the minimum R 2 , providing false confidence in the ability of the tag SNPs to capture the variation within a region, and in the power to detect a causal variant. Our results indicate that for some genes/regions HapMap II data may need to be supplemented by additional resequencing data to allow comprehensive association mapping of common variants.

Conclusion
We conclude that variation in DAF itself is unlikely to have a major effect in T1D in these populations. Analysis of an extended region, surrounding the DAF region analysed in this study, showed a cluster of several other genes involved in the complement system, including C4b binding protein (C4bp) and membrane cofactor protein (MCP), both known regulators of complement activation (RCA) genes [11,37,38]. C4bp and MCP restrict complement activation by inhibiting the formation of C3 convertases in the classical pathways like DAF, suggesting that they modulate each other in direct and indirect ways. To clarify the relation of autoimmune disease and complement system, including DAF, further genetic association studies and functional studies on RCA genes are needed. The set of tag SNPs and the LD map for the DAF region will be useful for such further studies.

Subjects
The resequencing panel consisted of 32 CEPH individuals; Utah residents with ancestry from Northern and West-  [39]. All cases and control were of white ethnicity.
All families were Caucasian and of European descent, with two parents and at least one affected child. The family collection consisted of 457 multiplex families from the U.K. British Diabetic Association Warren 1 repository [40] and 268 multiplex families from U.S.A. Human Biological Data Interchange [41].
The Cambridge Local Research Ethics Committee gave full ethical approval, and informed consent was obtained for the collection and use of these DNA samples from all subjects.

DAF resequencing
We first annotated the DAF gene locally [42,43] and displayed the annotation through gbrowse [44] within T1DBase[45], using these annotations we resequenced all 11 exons, exon/intron boundaries and up to 3 kb of 3' and 5' flanking sequence of the DAF gene in 32 CEPH individuals, to increase the SNP density across the region. The sequencing reactions were carried out on nested PCR products using Applied Biosystems (ABI) BigDye terminator v3.1 chemistry and the sequences resolved on an ABI3700 DNA Analyser. Polymorphisms were identified using the Staden Package [46] and double-scored by a second operator.

Statistical analysis
The multilocus test has been described in detail elsewhere [9,28,29,47], briefly, for the case-control data, the multilocus test is essentially Hotellings T 2 [48,49], in which we score each diallelic locus as 0, 1 or 2 and compare the mean score vectors between cases and control. In the case of the family data, the multilocus test takes the form of a multilocus TDT [28], in which, for each parent, we calculate a vector whose elements describe transmissions of each of the tag SNPs. If the parent is homozygous at a locus, the corresponding element is scored as zero, otherwise it scored as either +1 or -1 depending on which allele was transmitted. The multilocus test tests the mean of this vector against zero; it is asymptotically distributed as a χ 2 with degrees of freedom (df) equal to the number of tag SNPs [28,47].
The program for the selection of tag SNPs [28] and association analysis used here are implemented in the Stata statistical system and may be downloaded from our website [50].

Genotyping
Genotyping was performed using Taqman MGB (Applied Biosytems Inc, Foster City, CA) [51]. All genotyping data were double-scored to minimize error. All genotyping data were in Hardy-Weinberg equilibrium (P > 0.05).
Genotyping failure rates for all assays in both the family and case-control collection were ≤ 6%.