Searching QTL by gene expression: analysis of diabesity

Background Recent developments in sequence databases provide the opportunity to relate the expression pattern of genes to their genomic position, thus creating a transcriptome map. Quantitative trait loci (QTL) are phenotypically-defined chromosomal regions that contribute to allelically variant biological traits, and by overlaying QTL on the transcriptome, the search for candidate genes becomes extremely focused. Results We used our novel data mining tool, ExQuest, to select genes within known diabesity QTL showing enriched expression in primary diabesity affected tissues. We then quantified transcripts in adipose, pancreas, and liver tissue from Tally Ho mice, a multigenic model for Type II diabetes (T2D), and from diabesity-resistant C57BL/6J controls. Analysis of the resulting quantitative PCR data using the Global Pattern Recognition analytical algorithm identified a number of genes whose expression is altered, and thus are novel candidates for diabesity QTL and/or pathways associated with diabesity. Conclusion Transcription-based data mining of genes in QTL-limited intervals followed by efficient quantitative PCR methods is an effective strategy for identifying genes that may contribute to complex pathophysiological processes.


Background
Understanding the molecular etiology of disease processes is a pressing goal of 21 st century medicine. Completion of the mouse genome holds considerable promise in the discovery of genes responsible for genetically determined complex diseases. Quantitative trait loci (QTL) are allelically variant regions detected by virtue of their contribution to the overall complex disease phenotype and thus are "experiments in nature", which mark chromosomal intervals carrying genes with a proven disease involvement. Since gene expression is a key link between the genome and the plethora of phenotypic traits exhib-ited, tools that permit the analysis of the tissue expression pattern of genes in their chromosomal context provides a bridge between QTL and the genes responsible.
Global microarray gene expression technologies offer a promising, unbiased approach toward this goal in that they reveal gene expression changes, which can be correlated with the disease phenotype. However, such global methods of analysis are not routine analytical tools and can suffer from incomplete gene coverage, as well as lack of sensitivity. Because only a small fraction of the transcriptome is typically involved in any given etiopathological process, bioinformatic data mining tools that allow for the intelligent prioritization of genes would make it possible to employ more routine and sensitive expression technologies, such as quantitative polymerase chain reaction (QPCR). ExQuest [1,2] organizes pre-existing information-rich expression databases in a way that quantitative tissue and/or developmental gene expression patterns can be extracted and displayed within the context of whole chromosomes. By overlaying ExQuest "chromosomal expression maps" with QTL coordinates, one can search within defined genomic intervals for candidate genes with enhanced expression in tissues consistent with disease pathology. Screening candidates by QPCR will determine whether or not they exhibit expression changes in genetically resistant versus susceptible mice.
Type II diabetes (T2D), also referred to as non-insulin dependent diabetes mellitus, is characterized by insulin resistance of target cells combined with insufficient insulin production and/or abnormal secretion by pancreatic β-cells, which eventually results in chronically elevated glucose levels in the circulation. Obesity is a major risk factor for T2D, and the term "diabesity" has been coined to collectively describe these overlapping conditions [3]. This paper describes an ExQuest compilation of novel candidate genes from mouse diabesity QTL intervals mapped to chromosomes 1, 4, 10, 17, 19 and X. This is followed by QPCR analysis of the ExQuest candidates, in addition to known diabetes and metabolism genes selected from the literature, using tissues derived from the newly defined T2D model, Tally Ho (TH) [4]. By selecting genes from QTL intervals derived from multiple diabesity models and using the TH mouse as a susceptible strain to test for differential expression, we have identified a number of genes whose altered expression may be involved in the development of diabesity.

Diabesity-relevant candidate gene selection
To identify candidate genes that contribute to diabesity, we first established the local boundaries of 29 QTL from the LocusLink database that contribute to body weight, adiposity or T2D, and map to mouse chromosomes 1, 4, 10,17, 19 and X (Table 1). Using ExQuest, we were able to cluster publicly available ESTs to genomic fragments, extract and normalize tissue information from EST datasheet records, and display quantitative expression information linearly along the six chromosomes for over 70 tissues. We then narrowed the chromosomal search by overlaying the in silico expression data with QTL intervals. We were particularly interested in genes with known or potential metabolic function whose expression patterns were biased towards high expression in three diabesity relevant-tissues, pancreas, liver and adipose tissue, for which there was good EST library representation. (Skeletal muscle was not included due to the lack of available mouse EST libraries.) An example is illustrated in Figure 1A, in which a region of genes with expression strongly biased toward pancreatic tissue is found centered over the 1-LOD 95% confidence interval of the TH QTL Tafat [4]. By using ExQuest's zoom-in capability, resolution down to the expression of individual exons revealed that the expression bias was explained by a single gene, Elastase 2 (Ela2), with a pattern of expression that includes pancreas, stomach and tongue (Fig. 1B). A second example is illustrated in Figure 1C, in which a cluster of pancreatic expressing genes is localized to two chromosome 19 T2D QTL, T2dm2 [5] and Tanidd1 [4]. The expanded chromosomal view revealed that this cluster contained pancreatic lipase (Pnlip) and two paralogs, Pnliprp1 and Pnliprp2. From all QTL intervals, we chose the 71 "best expression" candidates (Table 1) out of a possible 4,013 genes as predicted by Ensembl's gene annotation. From PubMed searches, we selected an additional 24 genes considered to be highly relevant to diabesity and/or metabolism, as well as the standard normalizer 18S RNA, bringing our total to 96, a convenient number for QPCR profiling (for gene list and oligonucleotide primer sequences, see Additional file 1).

Real-time QPCR profiling
The newly-described TH T2D mouse model develops obesity, hyperinsulinemia, hyperlipidemia, and malelimited hyperglycemia [4]. To test the validity of our candidate gene selection and determine a quantitative molecular signature for this mouse model, we extracted adipose, liver and pancreas RNA from three early diabetic (as verified by glycemic index) 7-week-old male TH mice, as well as three age and sex matched B6 controls. Real-time QPCR was then independently performed for all 96 genes on the three biological replicates, and the results were processed using GPR software (for raw Ct values and GPR Reports, see Additional files 2, 3 and 4). Twenty-one of 96 genes exhibited significant expression changes between TH and control B6 mouse tissues (Table 2). Fifteen were among the 71 candidates chosen by ExQuest, while six were from the 24-gene PubMed pool (Table 3).

Novel genes potentially implicated in the progression or response to diabetes
To assess the novelty of the differentially expressed ExQuest-selected genes in regard to diabesity, we performed an extensive PubMed search for articles associated with both the gene name (or its alternative nomenclature as defined by LocusLink gene name aliases), and diabetes or obesity. The number of articles retrieved were typically one to two orders of magnitude lower for the differentially expressed ExQuest selected genes in comparison to the genes analyzed with known diabesity or metabolism involvement, with many having no literature citations (Table 3). QTL-based ExQuest expression mining thus revealed a number of novel diabesity candidate genes with little to no prior association with this disease.

Discussion
Our results suggest that in silico data mining focused on gene expression in diabesity-affected tissues and limited to intervals containing diabesity-specific QTL (and thus enriched in genes that contribute to diabesity) is an efficient method to identify genes whose expression is altered in T2D-susceptible mice. This analysis not only demonstrated expression alterations in genes known to be associated with the development of diabesity, but also identified a number of novel genes whose expression changes may contribute to the development of diabesity. Expression changes in genes mapping within TH QTL could be considered as candidates for transcriptional polymorphisms contributing to the associated TH QTL. However, genes selected on the basis of other diabesity QTL that showed transcriptional differences in the TH/B6 comparison are more likely minor QTL in the TH model or symptomatic effects of diabesity rather than a major genetic cause of the TH disease.
As other diabesity studies commonly observe, there were highly significant changes in adipose tissue expression of Pparg and Lep. This included a ~5-fold increase in leptin expression in TH adipose with no accompanying increase in expression of the leptin receptor (Lepr). Since leptin is responsible for the regulation of food intake [6], this increase is most likely in response to incipient weight gain. In contrast, the observed decrease in the expression of peroxisome proliferator activated receptor gamma (Pparg), both in TH adipose tissue and liver, is consistent with a decrease in adipocyte differentiation often observed in obese states [7,8]. In addition, Pparg agonists are generally associated with promoting insulin sensitization in the context of obesity [9].
Other genes previously associated with diabesity and showing differential expression in TH vs. B6 tissues included a 28.1-fold decrease of glucose transporter 2 (Glut2) in adipose, a 2.9-fold decrease in 1-acylglycerol-3phosphate O-acyltransferase 2 (Agpat2) in adipose, and a 1.2-fold decrease in lipoprotein lipase (Lpl) in liver. Their reduced expression in the TH model might be in response to disease onset.
Serum albumin levels are decreased in T2D patients (reviewed in [10]). The primary source of serum albumin is liver. Albumin synthesis is decreased in both diabetic humans and in rat T2D models [11,12] and in liver cells Mirrored tissue-specific expression (red) and absolute EST representation (green) for whole ExQuest chromosomes deprived of insulin [13]. We failed to detect any alteration in the normally high levels of Alb1 expression in liver, but TH mice showed a 9.4-fold decrease in Alb1 expression in adipose tissue. Whether this fat-specific decrease in Alb1 expression is an early manifestation of subsequent hypoalbuminemia remains to be established.
Altered expression of a number of ExQuest-selected genes was also found. TH livers showed a 15.6-fold expression increase compared with B6 for the very low-density lipoprotein receptor (Vldlr). As a deficiency in Vldlr has been reported to reduce adipocyte size and obesity in the ob/ob mouse model [14], this expression alteration may contribute to TH diabesity. The expression of hydroxysteroid (17β) dehydrogenase 9 (Hsd17b9), which localizes to the early body weight QTL Bgeq8 QTL, was elevated 9.5-fold in TH livers. While the function of this gene is not described in the literature, its paralog 11beta-hydroxysteroid dehydrogenase type 1 (Hsd11b1) bidirectionally catalyzes the conversion of cortisol to the inactive metabolite cortisone. Over 150 articles describe an association of Hsd11β1 with diabesity, and an intronic Hsd11b1 polymorphism is associated with obesity and insulin resistance in children [15]. However, whether the elevated expression of the Hsd17b9 paralog in TH liver is a candidate for Bgeq8 remains to be established.
Exocrine secretion of pancreatic lipases are known to hydrolyze triglycerides to free fatty acids in the small intestine. Pancreatic lipase (Pnlip) and two paralogs, Pnliprp1 and Pnliprp2, were expressed preferentially in pancreas and mapped very close to the peak of the LOD values for the overlapping diabesity QTL, T2dm2 and Tannidd1 (Fig.  1b). T2dm2 and Tannidd1 are responsible for increased insulin levels [5] and elevated plasma glucose [4], respectively. Long-term high fat feeding, leading to glucose intolerance, occurs with a simultaneous decrease in mRNA expression of Pnlip [16]. While no expression alteration in the normally high levels of expression of Pnlip or its closely related Pnliprp1 paralog were found, Pnliprp2 expression was decreased 21.8-fold in TH pancreas when mice were fed a 4% fat diet. By facilitating fat storage, the consequence of which is hyperglycemic diabetes, the selective decrease in Pnliprp2 in TH mice may explain Tanidd1, and potentially, the genetically overlapping T2dm2 QTL.
Trefoil 3 (Tff3), which maps to the Obq4 obesity QTL, was the most significantly changed of all genes analyzed. It is quite transcriptionally active in control B6 liver but virtually undetectable in TH liver. Trefoils are small, stable secretory proteins expressed in goblet cells in the gastrointestinal mucosa where they stabilize the mucus layer and promote epithelial healing [17]. Mice deficient in Tff3 are highly susceptible to colon damage [18], which is not commonly associated with T2D. While the colon is the primary tissue of expression, Tff3 has also been reported to be expressed in the bile ducts of normal human liver and is upregulated in diseased livers [19]. However, a potential association of the remarkable reduction of Tff3 expression in TH liver with diabesity remains to be established.
While not detectably expressed in the liver or fat tissue in B6 or TH mice, Ela2 was ranked by GPR as the gene most significantly changed in the pancreas, with a 55-fold reduction in TH vs. B6 mRNA (Table 2). Originally cloned from the pancreas [20], Ela2 is located within 75 kb of the peak LOD score for the TH QTL Tafat. ExQuest expression profiling suggests that Ela2 expression is biased strongly towards digestive tissues (pancreas, stomach and tongue) of normal mice (Fig. 1A). Both whole pancreas and Islet of Langerhans EST libraries show high levels of Ela2 expression, suggesting that the endocrine pancreas actively transcribes it. Ela2 (alias polymorphonuclear neutrophil elastase) encodes a serine protease present in neutrophil endosomal granules, which are known to be important in myelopoiesis (reviewed in [21]). The substantially decreased expression in TH pancreas may result in secretory granules deficient in this serine protease.

Conclusion
We have tested the concept that transcription-based ExQuest data mining of genes in QTL-limited intervals is an effective method to identify genes that contribute to the complex genetic disease, diabesity. By limiting the in silico candidate genes to those with expression biased to tissues normally affected by this disease, we have shown that sensitive, high-throughput QPCR methods reveal expression changes in novel genes in the TH model. The search within additional diabesity QTL using this technique may facilitate the identification of a limited number of genes that comprise a 'complete' diabesity molecular phenotype. Moreover, this general approach may be an efficient method to identify genes that contribute to complex pathophysiological processes.

QTL analysis
Obesity and T2D QTL on mouse chromosomes 1,4,10,17,19, and X, were identified by a LocusLink search [22]. Information regarding individual QTL boundaries was gathered through the Mouse Genome Informatics site (MGI) [23]. Simple sequence length polymorphism (SSLP) markers extracted from MGI were queried at the Ensembl website [24] to determine the exact chromosomal location.

In silico expression analysis
ExQuest [1,2] is a gene expression program that uses public EST databases to create a comprehensive transcriptome map overlaid with tissue specific expression. In brief, chromosomal sequence was downloaded from Ensembl and stringently masked using the RepeatMasker program.
ESTs were clustered to the chromosomes using the MegaBlast algorithm on a 32-node, 64 cpu Beowulf cluster. Tissue or library source information was extracted from EST datasheet records for each 10,000 base pair genomic fragment and normalized based upon library density and EST hit frequency. Whole chromosomal plots display total EST hits for each tissue as well as normalized data, which is a measure of a specific tissue expression level compared to all other tissues available in dbEST. In this way, the expressional bias of a genomic region towards a specific tissue is determined. QTL intervals were overlaid onto the chromosomal plots displaying adipose, liver and pancreas expression and these regions were then scanned for areas that exhibited high tissue specificity. ESTs clustering to genomic regions showing high tissue specificity were linked to Unigene [25], to determine if the particular EST aligned to a known gene cluster from which mRNA sequence could be extracted for primer design. Adipose and liver tissues stored in RNAlater were processed using the RNAqueous ® -4PCR Kit, conforming to the manufacturer's recommendations.

Tissue procurement, RNA preparation and cDNA synthesis
All RNA was subsequently digested with DNAse in accordance with the aforementioned protocol and analyzed for purity using the Agilent Bioanalyzer 2100. RNA concentrations were determined using a NanoDrop ® ND-1000 Spectrophotometer. The collected RNA was converted into cDNA via MessageSensor™ RT Kit (Ambion # 1745), conforming to manufacturer's recommendations. The adipose tissue cDNA reaction synthesis contained 350 ng of RNA while pancreas and liver samples utilized 1000 ng of RNA. All RNA was stored at -80°C when it was not being processed.

Primer design
Sequences for the candidate genes were extracted from GenBank and imported into Primer Express software v2.0 from Applied Biosystems, Inc (ABI). All primers were prepared in accordance with universal thermocycling parameters as described for real-time PCR on the ABI 7900HT. The primer sequences were then blasted [26] to ensure specificity of the primers. Forward and reverse primers (MWG Biotech) were combined in a 96-well master plate at a final concentration of 50 µM. Short amplicons of approximately 75 bp were generated to ensure a high level of sensitivity. Dissociation curves confirmed that only single amplicons were generated. Amplicons were TA cloned (Invitrogen # K204040) and bidirectionally sequenced to confirm sequence identity for instances in which genes exhibited significant expression. For gene names and primer sequences, see Additional file 1. Pancreas and liver cDNA was diluted 1:10 before addition to the master-mix, while adipose cDNA was diluted 1:3. A 384-well plate format was utilized such that 4 samples × 96 genes were amplified per plate. The plate was sealed with Optical Adhesive Covers (ABI # 4311971) and centrifuged. The samples were assayed on the ABI Prism 7900HT Signal Detection System v2.0 using default conditions, and baseline range values were set from 3 to 10 cycles.

Real-Time QPCR
The data were then analyzed using Global Pattern Recognition (GPR) analytical software [27,28]. In typical QPCR experiments, the comparative expression of all genes is based on single gene normalizer, whose expression is assumed to be invariant. In contrast, the Global Pattern Recognition (GPR) algorithm employs a global normalization feature in which the expression data from each gene are normalized against that of every other gene, thus eliminating the reliance on single gene normalization. GPR's ranking is based on biological replicate consistency, and is thus not skewed by fold change in the magnitude of expression. For raw CT values and GPR Reports, see Additional files 2, 3 and 4.