Association between the ABO locus and hematological traits in Korean

Background Recently, genome-wide association studies identified a pleiotropic gene locus, ABO, as being significantly associated with hematological traits. To confirm the effects of ABO on hematological traits, we examined the link between the ABO locus and hematological traits in Korean population-based cohorts. Results Six tagging SNPs for ABO were analyzed with regard to their effects on hematological traits [white blood cell count (WBC), red blood cell count (RBC), platelet (Plat), mean corpuscular volume (MCV), and mean corpuscular haemoglobin concentration (MCHC)]. Linear regression analyses were performed, controlling for recruitment center, sex, and age as covariates. Of the 6 tagging SNPs, 3 (rs2073823, rs8176720, and rs495828) and 3 (rs2073823, rs8176717, and rs687289) were significantly associated with RBC and MCV, respectively (Bonferroni correction p-value criteria < 0.05/6 = 0.008). rs2073823 and a reported SNP (rs8176746), as well as rs495828 and a reported SNP (rs651007), showed perfect linkage disequilibrium status (r2s = 0.99). Of the remaining 3 SNPs (rs8176720, rs8176717 and rs687289), rs8176717 generated an independent signal with moderate p-value (= 0.045) when it was adjusted for by rs2073823 (the most significant SNP). We also identified a copy number variation (CNV) that was tagged by the SNP rs8176717, the minor allele of which correlated with the deletion allele of CNV. Our haplotype analysis indicated that the haplotype that contained the CNV deletion was significantly associated with MCV (β ± se = 0.363 ± 0.118, p =2.09 × 10-3). Conclusions Our findings confirm that ABO is one of the genetic factors that are associated with hematological traits in the Korean population. This result is notable, because GWASs fail to evaluate the link between a CNV and phenotype traits.


Background
The ABO gene encodes isoforms for terminal glycosyltransferases, which transfer N-acetylgalactosamine and galactose to a common precursor (H substance), and lies on chromosome 9q34.2, containing 7 exons [1]. Exon 7 contains a domain that distinguishes between the A and B activities of the glycosyltransferase [2]. Several genomewide association studies (GWASs) have identified ABO as a candidate marker of the risk for coronary artery disease (CAD) [3], in addition to established CAD markers (sE-selectin, sP-selectin, and s-ICAM1) [4][5][6].
Hematological traits, such as red blood cell count (RBC), white blood cell count (WBC), platelet number (Plat), hemoglobin level (Hb), and hematocrit (Hct), are measured routinely to diagnose and monitor hematologic diseases and ascertain overall patient health. Recent GWASs on hematological traits have been reported for Caucasian [7], Japanese [8], and African-American [9] cohorts. These studies have identified more than 30 loci that carry common DNA polymorphisms that are linked to hematological traits.
The pleiotropic gene ABO correlated significantly with hematological traits in a Japanese [8] and African-American study [9], 3 SNPs of which (rs8176746, rs651007, rs495828) were reported in previous GWASs. rs8176746 is a nonsynonymous SNP and a deterministic variant of the B-type blood group [10]. rs651007 and rs495828 lie in the promoter region and are associated with CAD [4]. To confirm the effects of ABO on hematological traits, we examined the link between the ABO locus and hematological traits in Korean populationbased cohorts.

Hematological traits
The population characteristics and mean hematological traits are described in Table 1

SNP selection
SNPs in Affymetrix 5.0 SNP array and imputation SNP data were obtained from the Korean Genome Epidemiology Study (KoGES) of the National Institute of Health, Korea, and the genotype data were Korea Association Resource consortium (KARE) data. The genomewide SNPs have been examined in genomewide association studies for anthropometric [11] and biochemical traits [12]. In this study, we focused on the ABO region that was reported by a Japanese study.
Population stratification of the genotyped samples was also tested in an earlier report [11]; there was no population stratification that was demonstrated by Multidimensional Scaling (MDS) Analysis and Principal Component Analysis (PCA) (Additional file 1: Figure S1). Genomic inflation factors were low ranging from 1.01 (WBC) to 1.03 (Hct), suggesting that population stratification was well controlled (Additional file 2: Table S1) We initially used 76 SNPs around ABO on chromosome 9 from 135,070 kbp to 135,152 kbp. The ABO gene boundaries were established by linkage disequilibrium (LD) analysis (Additional file 3: Figure S2). Three LD blocks encompassed ABO and its promoter region. The 3 LD blocks included 58 SNPs, 10 of which were genotyped by Affymetrix 5.0 SNP array; the remaining 48 SNPs were imputed by IMPUTE, based on the HAPMAP database. The characteristics of the 58 SNPs are described in Additional file 2: Table S1. The SNPs were classified as 8 nonsynonymous SNPs, 1 synonymous SNP, 8 upstream SNPs, and 41 intron SNPs.

ABO gene SNP association study
For the association analysis, we isolated 6 tagging SNPs for ABO. In Additional file 4: Table S2, we describe the 6 SNP groups with high LD (r 2 > 0.9) and underlined the tagging SNPs. The association results are described in Table 2. In this study, we used Bonferroni correction pvalue criteria (< 8.3 × 10 -3 ) for multiple comparisons, and the significant effect sizes and p-values are underlined in Table 2. Three SNPs (rs2073823, rs8176720, and rs495828) and 3 SNPs (rs2073823, rs8176717, and rs687289) were significantly associated with RBC and MCV, respectively.
To identify independent association signals, we performed a conditional analysis by including rs2073823 in the linear regression model of other significant SNP associations. For RBC, the association signal of rs8176720 disappeared (p-value = 0.803), but that of rs495828 was significant (p-value = 0.004) after adjusting for rs2073823. rs8176717 was moderately associated with MCV (p-value = 0.045), but the association signal with rs687289 disappeared (p-value =0.492). Thus, we identified 3 independent associations (rs2073823, rs8176717 and rs495828) between ABO and hematological traits.

Identification of copy number variation
A copy number variation (CNV) region was detected on chromosome 9, 135,120,477-135,122,527 ( Figure 1), which includes the 3 0 untranslated region of the ABO gene. Because the array CGH experiment was conducted using a subset (n = 4694) of all KoGES samples, to maximize the sample size, we surveyed a tagging SNP that correlated well with CNV region  genotypes. We determined the SNP rs8176717 to correlate with the CNV region (r 2 = 0.96), the minor allele of which (T allele) implied the minor allele (deletion allele) of CNV.

Discussion
In this study, we confirmed the association between ABO and hematological traits in a large Korean population. Also, we found a copy number variation that influenced hematological traits. Of the 6 tagging SNPs in the ABO gene, rs2073823 was the most significant, in perfect LD (r 2 = 0.995) with rs8176746, an SNP from the Japanese GWAS on hematological traits [8]. The minor allele of rs8176746 is the variant that encodes the B-type blood group. [10]. However, this SNP was not reported in a GWAS of hematological traits in Caucasians [7,9], possibly due to ethnic differences in the minor allele frequency in Caucasian (0.08), Chinese (0.23), and Japanese (0.17) individuals. The allele frequencies correspond well to the frequency of blood type B in Caucasian (~8%) and East Asian (~22%) individuals, as inferred from the  BLOODBOOK website (http://www.bloodbook.com/ world-abo.html). Using the minor allele frequency (0.008) and the mean RBC (± sd) = 4.82 ± 0.50 of Caucasians, we estimated the number of individuals required for the 80% power at the alpha = 5 × 10 -8 (genome-wide significant levels) [7,9]. To be replicated the rs2073823 (LD with rs8176746) association, 51,876 individuals would be necessary. However, the previous European study [7] used 33,623 individuals which it was smaller than the estimated individual number at the genome-wide significant level. In our study, individuals with a minor allele of rs2073823 had elevated RBC counts but decreased MCV. Thus, individuals with the blood type B might have higher RBC counts and lower MCV than those with other blood types, at least among Asians. The second highest signal was generated from an upstream SNP, rs495828, which was also was reported in the Japanese GWAS [8]; this SNP was in perfect LD with rs651007, which was reported in an African-American GWAS [9]. Notably, the 3 proximal SNPs (rs651007, rs579459, and rs649129) were in complete LD (r 2 = 0.99) with rs495828. Because carriers of the minor allele of these 3 SNPs have significantly lower levels of sP-selectin [5], sE-selectin [6], and risk of CAD [4], the relationship between hematological traits and coronary artery disease phenotypes should be examined.
The Japanese GWAS reported complete LD between rs8176746 and rs495828. To confirm the LD, we estimated the LD in Europeans (r 2 = 0.010 and D' = 0.150), Africans (r 2 = 0.035 and D' = 1.000), Chinese, Japanese (r 2 = 0.050 and D' = 1.000), and Koreans in this study (r 2 = 0.087 and D' = 1.000). Even though it was reported that rs8176746 and rs495828 are in complete LD in the Japanese study, the data from publically available databases suggests some inconsistencies with high D' and low r 2 . This suggests that rs495828 may represent an independent association signal for RBC. A limitation of our study is that the 2 most significant SNPs-rs8176746 and rs495828-were not genotyped directly, although the minor allele frequencies of these SNPs are similar to those reported in the Japanese GWAS [2].
The CNV region that we identified has been reported by 7 other studies [13][14][15][16][17][18]. The minor allele of CNV was a deletion mutation of the 3 0 untranslated region of ABO; thus, the CNV might influence its expression. In our results, the haplotype included the minor allele of the CNV-tagging SNP (rs8176717) and was significant associated with MCV. This result is notable, because most GWASs do not evaluate the link between a CNV and phenotype traits. Thus, our study is a model that can be used to correlate SNPs and CNV.

Conclusions
ABO is one of the genetic factors that are associated with hematological traits in East Asian populations. Also, we identified a novel association with a SNP that tags a common CNV with MCV. This result is notable, because GWASs fail to evaluate the link between a CNV and phenotype traits.

Study participants
This study was conducted as part of an ongoing population-based cohort of the Korean Genome and Epidemiology Study (KoGES). All participants were recruited from the cities of Ansung and Ansan in Gyeonggi-do Province, Korea. This study was approved by the Institutional Review Board of the Korea National Institute of Health, and all participants provided written informed consent for study participation.

Hematological trait measures
A total of 6675 samples were available for hematological trait analysis, as described in Table 1, Venous blood samples were drawn from all participants into 4.5-ml tubes that contained K3-EDTA as an anticoagulant and were analyzed within 30 min to 4 h of collection. Hematological traits were measured by Seoul Clinical Laboratories Company Ltd. The ADIVA 120 hematology system (Bayer Diagnostics, USA) was calibrated per the manufacturer's guidelines. WBC count, RBC count, platelet count, Hb level, Hct, mean corpuscular volume (MCV), mean corpuscular hemoglobin (MCH) level, and mean corpuscular hemoglobin concentration (MCHC) were determined automatically for all samples.

SNP determination
The ABO gene is located on chromosome 9 from 135,120,384-135,140,451 bp. SNP genotypes were determined using the Affymetrix 5.0 SNP array, the experimental procedures of which are detailed elsewhere [11]. Further, to increase the number of genotype markers, we imputed additional SNPs using the Affymetrix 5.0 SNP array and the HapMap database (HAPMAP 3, http://www.hapmap.org); the imputation methods have been described [19]. The final SNPs were selected using the following criteria: minor allele frequency > 0.1; missing rate < 10%; and Hardy-Weinberg equilibrium test p-value > 0.05 for experimentally determined SNPs and imputation SNPs. Information on the SNPs was obtained from the dbSNP database (http://www.ncbi. nlm.gov/snp), and the genetic distance between the Korean and other populations was calculated using Fstatistic [20]. LD blocks and pairwise LD (D' and r 2 ) of SNPs were estimated and determined for the tagging SNPs in the ABO gene region using Haploview [21].

CNV determination
To identify regions of CNV, samples from 4694 participants were genotyped using the NimbleGen HD2 2x720K array comparative genomic hybridization (aCGH) assay with DNA from peripheral blood. All samples passed experimental quality control metrics, such as the chromosome X shift and mad.1dr, as determined using NimbleScan version 2.5 per the manufacturer's guidelines. After quality control procedures, the signal intensity ratio between the test and reference sample (NA10851 from the HapMap cell line DNA) of each probe was log2-transformed.
Regions of CNV were identified using the Genome Alteration Detection Analysis algorithm [22], which was used for samples from 4694 participants, with T = 10, alpha = 0.2, and MinSegLen = 10. The threshold for defining regions of CNV was set to an average log2 ratio of ± 0.25 Additional file 5: Figure S3.

CNV-tagging SNP
We tagged SNPs to maximize the sample size. To find SNPs that tagged the identified CNVs well, we performed a correlation analysis that was similar to that in the Wellcome Trust Case Control Consortium CNV study [23] using calls that were identified in a GWAS with the Affymetrix 5.0 array [11]. For each CNV, we calculated the squared Pearson's r value between CNV regions and SNPs. We considered all SNPs within 1 Mb of the estimated 2 breakpoints (i.e., start and end points) of each CNV region. We selected the SNP with the highest r 2 value for each CNV region.

Association tests
Linear regression analysis was used to analyze the association between ABO SNPs or haplotypes of tagging SNPs and hematological trait, controlling for gender, age, and recruitment center as covariates. The asymptotic Hardy-Weinberg equilibrium test was conducted using PLINK (version 1.07) [24], and all reported p-values were two-sided (α = 0.05). Associations between SNPs and hematological traits were significant at p < = 8.3 × 10 -3 after Bonferroni correction for multiple testing of 6 SNPs. The sample size was estimated for rs2073823 association in the European with the 80% statistical power at the genome-wide significance level by the QUANTO software (version 1.2.4, http://hydra.usc.edu/gxe/).