iCartiGD: the Integrated Cartilage Gene Database
- Ming-Yiu Yeung†1,
- David K Smith†1,
- Matthew SY Chan1,
- Cheuk M Li1,
- Brian C Wong1,
- Kenneth MC Cheung2,
- Keith DK Luk2,
- Kathryn SE Cheah1,
- Pak Sham3, 4,
- Danny Chan1 and
- You-Qiang Song1, 2, 4Email author
© Yeung et al; licensee BioMed Central Ltd. 2007
Received: 20 September 2006
Accepted: 23 February 2007
Published: 23 February 2007
Diseases of cartilage, such as arthritis and degenerative disc disease, affect the majority of the general population, particularly with ageing. Discovery and understanding of the genes and pathways involved in cartilage biology will greatly assist research on the development, degeneration and disorders of cartilage.
We have established the Integrated Cartilage Gene Database (iCartiGD) of genes that are known, based on results from high throughput experiments, to be expressed in cartilage. Information about these genes is extracted automatically from public databases and presented as a single page report via a web-browser. A variety of flexible search options are provided and the chromosomal distribution of cartilage associated genes can be presented.
iCartiGD provides a comprehensive source of information on genes known to be expressed in cartilage. It will remain current due to its automatic update capability and provide researchers with an easily accessible resource for studies involving cartilage. Genetic studies of the development and disorders of cartilage will benefit from this database.
Diseases of cartilage, such as arthritis and degenerative disc disease, affect the majority of the general population, particularly with ageing. In recognition of the impact of musculo-skeletal disorders on society, 2000–2010 has been declared the bone and joint decade by the World Health Organization . One aim of this decade is to foster research relevant to musculo-skeletal systems. Discovery and understanding of the genes and pathways involved in cartilage biology will greatly assist research on the development, degeneration and disorders of cartilage. To this end we have established the Integrated Cartilage Gene Database (iCartiGD) of genes that are known, based on results from high throughput experiments, to be expressed in cartilage. Information about these genes is extracted automatically from public databases and presented as a single page report via a web-browser. Several flexible search options are provided and the chromosomal distribution of cartilage associated genes can be presented. iCartiGD provides researchers with an easily accessible resource for studies involving cartilage.
When compared with databases of relatively well studied organ systems, such as the human prostate gene database  or the ovarian kaleidoscope database , databases of cartilage associated genes are less developed or not publicly available. A skeletal gene database  and its accompanying skeletal transcript database have been created. These databases contain a limited number of genes and approximately 80,000 ESTs mainly from human and mouse trabecular bone and bone marrow stromal cell libraries. Recently an Osteo-Promoter Database has been created  which contains information on the promoter regions of the approximately 600 genes in SGD. Both these databases provide links to other sources but do not give comprehensive reports on the genes.
However, large amounts of information about cartilage associated genes are available publicly. Numerous genes involved in skeletal development have been discovered through in vitro and in vivo studies [6, 7]. Expressed sequence tag (EST) libraries, prepared from both normal and diseased human cartilage have been created [8–10] as have cDNA libraries . Another similar set of libraries from Serial Analysis of Gene Expression (SAGE) also provides cartilage specific expression libraries. Microarray based studies of cartilage tissue have also been conducted [12, 13]. These libraries provide lists of genes that are expressed in various cartilage tissue subtypes and indicate their level of expression and their degree of differential expression between diseased and normal cartilage tissue. iCartiGD combines these data with other gene specific information such as its nomenclature, chromosome position, sequence, protein domains or families, homologs, SNPs, expression levels in various tissues, gene ontology, associated disorders and literature references.
iCartiGD has been designed to facilitate access to the wealth of publicly available information on cartilage associated genes by providing a one-stop source for this information. Rather than search a variety of databases, scientists and clinicians studying cartilage can utilise the automatically updated iCartiGD, with its flexible search functions, to access the data and resources they require.
Construction and content
URLs of databases and resources integrated via iCartiGD
UCSC genome browser
Only genes that are expressed in cartilage are included in the database. We determine genes that are expressed in cartilage based on reference to four lines of evidence. Currently, eight EST libraries from dbEST at the NCBI give approximately 5,000 genes that are expressed in normal and osteoarthritic cartilage, and chondrosarcomas. SAGE libraries derived from chondrosarcomas, and microarray studies and a cDNA library of normal cartilage complete the cartilage tissue data set. This gives a total of approximately 14,000 genes (based on EntrezGene identifiers) that are expressed in at least one cartilage tissue type. An evidence display, based on these sources, reports the justification for the inclusion of a gene in the database.
EntrezGene records were primarily used to build the database. Where appropriate, UniGene or Affymetrix probe-ids were mapped to EntrezGene identifiers. Based on these, the relevant information from the databases mentioned above was downloaded and integrated into iCartiGD. The gene name, obtained via the EntrezGene identifier, forms the primary key for the tables in iCartiGD, with the Human Genome Nomenclature Commission (HGNC) gene name being used where available.
Data parsing and extraction from the source databases are performed by PERL scripts, some of which utilise published parser modules . The extracted data are entered into tables in a MySQL database. PHP scripts are used to generate HTML web pages dynamically for the graphical user interface. Web pages can also be generated in XML format and transformed by XSL so that iCartiGD is web-services ready. The bioinformatics server of the BIOSUPPORT project of the University of Hong Kong hosts iCartiGD. This server also provides regularly updated local mirrors of many of the source databases from which iCartiGD is automatically updated on a weekly basis.
Update of the database is performed by automatically regenerating the database each week using a shell script to link the data extraction and database update scripts. The EST and other libraries are down loaded for each update run and the weekly updated mirrors of the source databases provided by the BIOSUPPORT project are accessed. This allows iCartiGD to take account of frequent modifications to databases such as UniGene and to provide the correct mappings from UniGene to the latest version of EntrezGene. Rebuilding the database from the current versions of the source databases ensures all modifications to the data are obtained.
Utility and discussion
For cartilage researchers needing access to information on their genes of interest, a variety of query options have been provided through either a quick query box or a query form. The query form allows the combination of search results or further searches within the current results, and the history of the last 30 searches is retained. Queries of the database can be based on all genes in the database, or on the genes expressed in a particular tissue subtype. Advanced search options, such as field limits, Boolean operators, wild cards and phrase matching are available. It is possible for a user to construct queries in URL format so that scripts can be used to access the database instead of the web-based GUI if high throughput is needed. Users can also browse the genes stored in the database by chromosome number or alphabetically by the gene symbol or the gene name. Another form of query is provided by an interface to the BLAST search programs . Users may search for matches to the sequences in iCartiGD using either the nucleic or amino acid sequences of interest to them.
To assist with genetic studies, the genomic location of the genes and transcripts in iCartiGD can be visualised on a transcriptome map. At any chromosomal position a user can retrieve the genes or transcripts in the database at that location. This will allow the identification of genes that are expressed in cartilage in the region of, for example, a candidate gene for cartilage disease, or the cartilage expressed genes in a region in significant linkage with disease markers. Comparisons of the differences in gene expression patterns among the cartilage tissue types can be made by examining the gene or transcript density  along each chromosome. Clustering within the genome of cartilage expressed genes can also be examined by tissue subtype with this facility.
As iCartiGD is automatically updated on a weekly basis from major public data sources it will not suffer the lack of currency that befalls many databases. In the event of new EST libraries or large scale transcriptome studies on cartilage being made available, they can be added to the gene expression sources parsed by the update scripts which extract the list of gene identifiers to be supplied to update iCartiGD. Current data for the newly obtained and existing genes will then be automatically retrieved and included in iCartiGD.
A knowledgebase is provided in the website to allow new visitors to become familiar with site. Questions received from different individuals can be posted in the knowledge base when appropriate to also allow fellow users to respond to queries or initiate discussions.
Searches of expression databases and the literature will be used to identify new studies of gene expression in cartilage. Further developments in iCartiGD will include a differential display tool to assist with comparative studies of different conditions and tissue subtypes. Improved methods to search the database using finer subdivisions of the data than the tissue sub-type currently available will be added. Other methods to examine the genomic distribution of cartilage expressed genes  will be included, as will genes associated with cartilage disorders, as they are identified from our and others' ongoing studies.
iCartiGD provides a comprehensive source of information about genes known to be expressed in cartilage. It will remain current due to its automatic update capability and will facilitate the research of basic scientists and clinicians studying cartilage related genes. Genetic studies of cartilage development and disorders will also benefit from this database.
Availability and requirements
iCartiGD is available to the public at http://bioinfo.hku.hk/iCartiGD/.
We gratefully acknowledge the BIOSUPPORT project (http://www.bioinfo.hku.hk), the Computer Centre and the Genome Research Centre of The University of Hong Kong. We also thank Frankie Cheung for expert technical assistance. This work was supported by grants from the Research Grant Council of Hong Kong (HKU7509/03M, YQS) and the University Grants Committee of Hong Kong (AoE/M-04/04, KSEC).
- Bjdoline (World Health Organization). [http://www.boneandjointdecade.org]
- Li LC, Zhao H, Shiina H, Kane CJ, Dahiya R: PGDB: a curated and integrated database of genes related to the prostate. Nucl Acids Res. 2003, 31: 291-293. 10.1093/nar/gkg008.PubMed CentralView ArticlePubMedGoogle Scholar
- Ben-Shlomo I, Vitt UA, Hsueh AJW: Perspective: The Ovarian Kaleidoscope Database-II. Functional Genomic Analysis of an Organ-Specific Database. Endocrinology. 2002, 143: 2041-2044. 10.1210/en.143.6.2041.PubMedGoogle Scholar
- Jia L, Ho NC, Park SS, Powell J, Francomano CA: Comprehensive resource: Skeletal gene database. Am J Med Genet. 2001, 106: 275-281. 10.1002/ajmg.10227.View ArticlePubMedGoogle Scholar
- Grienberg I, Benayahu D: Osteo-Promoter Database (OPD) - promoter analysis in skeletal cells. BMC Genomics. 2005, 6: 46-10.1186/1471-2164-6-46.PubMed CentralView ArticlePubMedGoogle Scholar
- Karsenty G, Wagner EF: Reaching a genetic and molecular understanding of skeletal development. Dev Cell. 2002, 2: 389-406. 10.1016/S1534-5807(02)00157-0.View ArticlePubMedGoogle Scholar
- Kronenberg HM: Developmental regulation of the growth plate. Nature. 2003, 423: 332-336. 10.1038/nature01657.View ArticlePubMedGoogle Scholar
- Jung YK, Jeong JH, Ryoo HM, Kim HN, Kim YJ, Park EK, Si HJ, Kim SY, Takigawa M, Lee BH, Park RW, Kim IS, Choi JY: Gene expression profile of human chondrocyte HCS-2/8 cell line by EST sequencing analysis. Gene. 2004, 330: 85-92. 10.1016/j.gene.2004.01.007.View ArticlePubMedGoogle Scholar
- Kumar S, Connor JR, Dodds RA, Halsey W, Van Horn M, Mao J, Sathe G, Mui P, Agarwal P, Badger AM, Lee JC, Gowen M, Lark MW: Identification and initial characterization of 5000 expressed sequenced tags (ESTs) each from adult human normal and osteoarthritic cartilage cDNA libraries. Osteoarthritis Cartilage. 2001, 9: 641-653. 10.1053/joca.2001.0421.View ArticlePubMedGoogle Scholar
- Zhang H, Marshall KW, Tang H, Hwang DM, Lee M, Liew CC: Profiling genes expressed in human fetal cartilage using 13,155 expressed sequence tags. Osteoarthritis Cartilage. 2003, 11: 309-319. 10.1016/S1063-4584(03)00032-3.View ArticlePubMedGoogle Scholar
- Pogue R, Sebald E, King L, Kronstadt E, Krakow D, Cohn DH: A transcriptional profile of human fetal cartilage. Matrix Biol. 2004, 23: 299-307. 10.1016/j.matbio.2004.07.003.View ArticlePubMedGoogle Scholar
- Yager TD, Dempsey AA, Tang H, Stamatiou D, Chao S, Marshall KW, Liew CC: First comprehensive mapping of cartilage transcripts to human genome. Genomics. 2004, 84: 524-535. 10.1016/j.ygeno.2004.05.006.View ArticlePubMedGoogle Scholar
- Olney RC, Wang J, Sylvester JE, Mougey EB: Growth factor regulartion of human growth plate chondrocyte proliferation in vitro. Biochem Biophys Res Commun. 2004, 317: 1171-1182. 10.1016/j.bbrc.2004.03.170.View ArticlePubMedGoogle Scholar
- Liu M, Grigoriev A: Fast parsers for Entrez Gene. Bioinformatics. 2005, 21: 3189-3190. 10.1093/bioinformatics/bti488.View ArticlePubMedGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.PubMed CentralView ArticlePubMedGoogle Scholar
- Qiu P, Benbow L, Liu S, Greene JR, Wang L: Analysis of a human brain transcriptome map. BMC Genomics. 2002, 3: 10-10.1186/1471-2164-3-10.PubMed CentralView ArticlePubMedGoogle Scholar
- Li Q, Lee BT, Zhang L: Genome-scale analysis of positional clustering of mouse testis-specific genes. BMC Genomics. 2005, 6: 7-10.1186/1471-2164-6-7.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.