Detection of gene-environment interactions in the presence of linkage disequilibrium and noise by using genetic risk scores with internal weights from elastic net regression

Background For the analysis of gene-environment (GxE) interactions commonly single nucleotide polymorphisms (SNPs) are used to characterize genetic susceptibility, an approach that mostly lacks power and has poor reproducibility. One promising approach to overcome this problem might be the use of weighted genetic risk scores (GRS), which are defined as weighted sums of risk alleles of gene variants. The gold-standard is to use external weights from published meta-analyses. Methods In this study, we used internal weights from the marginal genetic effects of the SNPs estimated by a multivariate elastic net regression and thereby provided a method that can be used if there are no external weights available. We conducted a simulation study for the detection of GxE interactions and compared power and type I error of single SNPs analyses with Bonferroni correction and corresponding analysis with unweighted and our weighted GRS approach in scenarios with six risk SNPs and an increasing number of highly correlated (up to 210) and noise SNPs (up to 840). Results Applying weighted GRS increased the power enormously in comparison to the common single SNPs approach (e.g. 94.2% vs. 35.4%, respectively, to detect a weak interaction with an OR ≈ 1.04 for six uncorrelated risk SNPs and n = 700 with a well-controlled type I error). Furthermore, weighted GRS outperformed the unweighted GRS, in particular in the presence of SNPs without any effect on the phenotype (e.g. 90.1% vs. 43.9%, respectively, when 20 noise SNPs were added to the six risk SNPs). This outperforming of the weighted GRS was confirmed in a real data application on lung inflammation in the SALIA cohort (n = 402). However, in scenarios with a high number of noise SNPs (>200 vs. 6 risk SNPs), larger sample sizes are needed to avoid an increased type I error, whereas a high number of correlated SNPs can be handled even in small samples (e.g. n = 400). Conclusion In conclusion, weighted GRS with weights from the marginal genetic effects of the SNPs estimated by a multivariate elastic net regression were shown to be a powerful tool to detect gene-environment interactions in scenarios of high Linkage disequilibrium and noise. Electronic supplementary material The online version of this article (doi:10.1186/s12863-017-0519-1) contains supplementary material, which is available to authorized users.


Real data application
The following information is based on Hüls et al. (2017) [1].

Study design and population
Our study was based on the 2008 examination of the German SALIA cohort study (Study on the influence of Air pollution on Lung function, Inflammation and Aging) population. A detailed description of the SALIA study population has been published previously (Schikowski et al. 2005; Vossoughi et al. 2014). Briefly, the SALIA cohort study was initiated in the early 1980s to investigate the health effects of air pollution exposure in elderly women. The study population consists of women, living in the industrialized Ruhr area in Germany (urban area) and women, living in the Southern Muensterland (rural area). Baseline examinations were conducted between 1985 and 1994 including 4874 women (aged 55 years). This study is based on the first follow-up examination in 2008 in which we examined 402 women [2][3][4].
Approval of the study was obtained from the Ethical Committee of the University of Bochum.
The Declaration of Helsinki Principles was followed and all study subjects were informed in detail by written form and gave written consent.
Air pollution assessment PM2.5, filter absorbance of PM2.5 (soot), PM10 and NO2 exposures were estimated with landuse regression models (LUR). Air pollution monitoring campaigns were performed over a period of one year in the study area in 2009 in frame of the ESCAPE (European Study of Cohorts for Air Pollution Effects) project. Three two-week measurements of NO2 were performed within one year at 40 sites in the Ruhr area and Southern Muensterland.
Simultaneous measurements of PM2.5 and PM10 were performed in a subsample of study areas selected for the NO2 measurement campaign. PM measurements were performed at 20 sites within each study area [5,6]. Predictor variables on nearby traffic, population/household density and land use were derived from Geographic Information Systems (GIS) and were evaluated to explain spatial variation of annual average concentrations. Regression models were developed to maximize the adjusted explained variance, using a supervised forward stepwise approach. LUR models were developed for each pollutant using all available measurement sites. LUR models were then used to estimate air pollution concentration on an individual basis at the women's addresses, for which the same GIS predictor variables were collected.

Assessment of subclinical inflammation
Our analysis was focused on the inflammatory biomarker leukotriene (LT)B4 (LTB4) [2]. All examinations were conducted according to standardized protocols [2,8,9]. Participants inhaled vaporized isomolar saline solution for 10 minutes and were then asked to provoke coughing.
Induced sputum (IS) was collected and processed according to Raulf-Heimsoth et al. (2011) and then analyzed for soluble inflammatory mediators and differential cell counts [9]. After centrifugation, the cell free supernatants were aliquoted, stored at −80°C until further analysis of soluble markers. The cell pellets were re-suspended and the total number of cells as the sum of eosinophils, macrophages, neutrophils and epithelial cells was determined.
Concentrations of LTB4 were measured by specific enzyme immunoassays (competitive EIA) kits (Assay Designs, Ann Arbor, USA) with a detection limit of 11.7 pg/ml.

Determination of genetic markers
We investigated possible gene-air pollution interactions on subclinical inflammation for nine SNPs of the PERK pathway of the UPR, which plays a role in inflammation processes [10].
The selection of genes and related functional SNPs was based on literature research. Besides PERK and ATF4, which were recently shown to be involved in a murine model of neutrophil asthma [11], we analyzed SNPs of two enzymes engaged in the ER-associated degradation (ERAD) of misfolded proteins [12,13]. Mannosidase trims mannose residues from misfolded glycoproteins and targets them to ERAD. Functional studies indicated that the A allele of rs4567 suppresses mannosidase translation under ER stress conditions [14]. Recently, Ito E (2015) et al revealed, that N-glycosylation plays a role in the pathogenesis of COPD [15]. The second enzyme -Protein disulfide isomerase (PDI) -transfers oxidative equivalents to proteins. Smoking changes the redox state of PDI [16] and increased levels of hyper oxidized PDI are associated with COPD [17]. The SNP for ORMDL3 -rs4795405 -is also associated with severe asthma and COPD [18].
DNA was extracted from blood samples of each individual using a standard procedure (QIAamp DNA Mini Kit, QIAGEN, Hilden, Germany). DNA amplification and genotyping were performed by LCG/KBioscience (Hoddesdon, UK) using the competitive allele-specific polymerase chain reaction SNP genotyping system (KASPar) with an error rate <0.3%. SNPs that violated the Hardy-Weinberg Equilibrium (HWE) were excluded from analysis. Tables   Table S1: Interaction with a mean OR for GxE (6 main SNPs) of 1.01. Minor allele frequency (MAF), OR and p-values for the main effects of the SNP (G) and environmental factor (E) and gene-environment interaction (GxE)    Table S3: Interaction with a mean OR for GxE (6 main SNPs) of 1.05. Minor allele frequency (MAF), OR and p-values for the main effects of the SNP (G) and environmental factor (E) and gene-environment interaction (GxE)