LD2SNPing: linkage disequilibrium plotter and RFLP enzyme mining for tag SNPs
© Chang et al; licensee BioMed Central Ltd. 2009
Received: 11 November 2008
Accepted: 06 June 2009
Published: 06 June 2009
Linkage disequilibrium (LD) mapping is commonly used to evaluate markers for genome-wide association studies. Most types of LD software focus strictly on LD analysis and visualization, but lack supporting services for genotyping.
We developed a freeware called LD2SNPing, which provides a complete package of mining tools for genotyping and LD analysis environments. The software provides SNP ID- and gene-centric online retrievals for SNP information and tag SNP selection from dbSNP/NCBI and HapMap, respectively. Restriction fragment length polymorphism (RFLP) enzyme information for SNP genotype is available to all SNP IDs and tag SNPs. Single and multiple SNP inputs are possible in order to perform LD analysis by online retrieval from HapMap and NCBI. An LD statistics section provides D, D', r2, δ Q , ρ, and the P values of the Hardy-Weinberg Equilibrium for each SNP marker, and Chi-square and likelihood-ratio tests for the pair-wise association of two SNPs in LD calculation. Finally, 2D and 3D plots, as well as plain-text output of the results, can be selected.
LD2SNPing thus provides a novel visualization environment for multiple SNP input, which facilitates SNP association studies. The software, user manual, and tutorial are freely available at http://bio.kuas.edu.tw/LD2NPing.
Single nucleotide polymorphisms (SNPs) are very important markers for disease  and cancer  association studies. The number of identified SNPs is currently estimated to be about 3.1 million . Identification of associations by statistical analyses of SNP data is challenging due to the large number of SNPs involved.
Linkage disequilibrium (LD) is one of the most commonly used methods when choosing informative SNPs that represent the original SNP distribution in a genome for genome-wide association studies. LD mappings are commonly used to evaluate markers across large data sets. Given the vast amount of data in association studies, visualization of the LD results in graphical form rather than text form facilitates the interpretation of the results considerably .
Many types of visualization software for LD have been developed, e.g. LDA , Haploview , and JLIN . Although these tools have made valuable contributions to LD visualization and analysis, they lack many services and tools for users to generate genotype data for LD analysis. Without the actual data set itself, users are unable to perform LD analysis. However, many types of software exist which provide information for genotyping, e.g. the SNPlex genotyping system , SNP cutter , SNP-RFLPing , and V-MitoSNP . These programs do not include an LD function though. It is thus still difficult for researchers to narrow down the number of SNPs for performing SNP genotyping. A common way of identifying tag SNPs of the genes of interest is to check the HapMap website http://www.hapmap.org. Currently available tools, however, are not well integrated, but rather are independent programs.
We have thus integrated an SNP genotyping service and LD visualization/analysis tool in a single program to provide a single platform for tag SNP selection, SNP genotyping, and LD analysis. This platform, LD2SNPing, furthermore provides a novel function for multiple SNP inputs in order to directly plot the LD. The user can input SNPs of interest and calculate the LD measurement for SNP selection before the genotyping process. This stand-alone JAVA-based visualisation tool greatly facilitates preparation of the genotype data and increases the performance of LD analyses.
LD2SNPing is a Java-based software, which is implemented under the Java Runtime Environment (JRE) and Java 3D. The LD statistics program calculates D, D', r2, δ Q , and ρ values, as well as the P value of Hardy-Weinberg Equilibrium (HWE-P) calculations for each SNP marker. LD2SNPing provides the P value of the Chi-square test and P value of the likelihood-ratio test for the pair-wise association of two SNPs are also provided in the LD calculation. LD2SNPing processes genotype data and estimates pair-wise loci haplotype frequencies of the sample using an expectation-maximization algorithm (EM) . Except the exact tests of HWE  is implemented in LD2SNPing, the equations used in these calculations are listed in the appendix of the user manual as described by LDA .
In visualization of LD plot, the LD2SNPing software provides SNPs with a minor allele frequency (MAF) value greater than 0.01. All the MAF and HWE-P values for these SNPs are provided in the text window.
The SNP genotype information and the tag SNPs are retrieved online from dbSNP version BUILD 129 of NCBI http://www.ncbi.nlm.nih.gov/SNP/ and HapMap http://www.hapmap.org version HapMap Data Rel 23a/phaseII Mar08, on the NCBI B36 assembly, dbSNP b126 , respectively. Online retrieval for SNP genotype information from NCBI using SNP ID and gene input is similar to the function described in the SNP-Flankplus  and SNP ID-info . The default setting for the minor allele frequency (MAF) cut-off in tag SNP from HapMap is 0.2. Four populations, CEU, CHB, JPT, and YRI (Caucasian, Han-Chinese, Japanese and Sub-Saharan African, respectively) are selectable during tag SNP retrieval from HapMap. The retrieved data are the most up-to-date data available. The RFLP database structure is based on REBASE http://www.rebase.org version 610. The RFLP mining function for the selected SNP is provided by the SNP-RFLPing , which is integrated in the LD2SNPing.
A demonstration and user manual of the LD2SNPing software are available as a free download from http://bio.kuas.edu.tw/LD2SNPing. Many animations explaining how to use the LD2SNPing software are provided on the homepage and embedded in the user manual (see Additional file 1) as tutorials.
Data import formats: File input
LD2SNPing accepts four different input file formats, namely two Excel (.xls and .cvs), Word (.doc) and NotePad (.txt) formats. The first and second rows for each file are reserved for the user-defined SNP name and the distance between SNPs (optional), respectively. Individual genotypes accept the following formats: NN, N_N, and N/N (N is one of four possible nucleotides). If the input file is missing a genotype, it is automatically bypassed in LD2SNPing processing without interference. Some example files for testing are available in the example file folder of the LD2SNPing software package.
Data import formats: rsID input
Data import formats: Gene input
LD2SNPing accepts gene name (HUGO, Human Genome Organization) input to provide tag SNPs through online retrieval from HapMap (Figure 1B).
LD-free function: Retrieval of individual SNP information from NCBI
In Figure 1A, the SNP (rs17884306) information for all populations of the dbSNP is provided (P1, CAUC1, AFR1, HISP1, and PAC1). The ssID#s (ss32469505 and ss48297306) for the corresponding rsID# (rs17884306) can be selected by using the pull-down window.
LD-free function: Gene input for finding rsID data of tag SNP
In Figure 1B, LD2SNPing provides the tag SNP information through HapMap by gene input. The example shown is BRCA2. The tag SNP candidates provided by LD2SNPing are completely matched with those of HapMap (shown in the user manual). HapMap-CEU, HCB, JPT and YRI are acceptable for selection.
LD-free function: RFLP enzyme mining tool
LD2SNPing executes RFLP restriction enzyme mining upon clicking of the RFLP box indicated by arrow 6 of Figure 1A and arrow 5 of Figure 1B. RFLP results are shown in the format pictured in Figure 1C, in which restriction enzyme information for SNPs of interest (here, rs9534275) are shown. Information about alleles, enzyme name, the recognition sequence and commercial availability is provided.
LD function: Input formats for 2D analysis
LD function: 2D-LD graph
The distance between SNPs supplied in the input file can be optionally displayed or hidden (number 1 of Figure 3A). This distance is shown next to the diagonal line as a numerical value. By clicking on the "select scope" (number 2 of Figure 3A) and "repaint" (number 8 of Figure 3A) buttons, a user can limit the number of SNPs shown to only those of interest. This view can be reversed by clicking on the "restore scope" (number 3 of Figure 3A) button. The parameters for LD measurement are selected by the two axes named "left and right LD measure" (numbers 4 and 5 of Figure 3A, respectively). Different color schemes for each of the statistics can be selected (numbers 6 and 7 of Figure 3A). Moreover, LD2SNPing provides a window for the minor allele frequency (MAF) value and HWE-P values for each analyzed SNP when LD analysis is performed (not shown). A more detailed description is given in the user manual.
LD function: Data analysis of LD information
In addition, LD2SNPing provides graphic analyses, such as grids and pie3D graphs, to supplement the 2D-LD visualization and analysis (numbers of 10 and 11 of Figure 3A). The results are shown in the user manual.
LD function: 3D-LD graph
The 3D visualization of LD is performed by clicking on the icon for number 13 in Figure 3A. It is the same as in the 2D-LD plot except for the color patterns and the color ranges. In LD-3D, the distance and LD measurement values are indicated by the height in the diagonal line (Figure 3B). Users can toggle between the 2D-LD view or close the analysis by clicking on the icon for numbers 12 and 9 of Figure 3A, respectively.
All the analyzed results can be saved as tab-delimited text files (.txt) and graphic files (.jpg) for convenience. The LD parameters are exported to a single file. Figure 4B shows a sample test result for "LD measure data", D'. All the D' values for each SNP are listed pairwise, a common publishing format. Other LD parameters are not shown here, but are available in the user manual.
Comparison of some LD software
Comparison of some LD software platforms
Input file formats
ped, info, txt
xls, csv, txt, doc
Output file formats
pdf, eps, png, txt
D, D', r2, P
δ Q , ρ
Tag SNP mining by gene input
SNP information retrieval by rsID input
Enzyme mining for RFLP genotyping
Online retrieval of multiple SNPs for LD plot
2D graph visualization
2D graph/text data output
3D distance/graph visualization
Generally, SNP genotyping has to be performed to generate the SNP genotypes needed for LD analysis. Before performing LD analysis, however, all of the available LD software platforms only provide LD measurements without providing supporting functions, such as tag SNP mining by gene input, retrieval of SNP information, or RFLP enzyme mining for genotype. These supporting functions are provided in LD2SNPing (Table 1). Moreover, LD2SNPing allows for input of multiple SNPs for LD analysis (Figure 2). The genotype information of input SNPs are retrieved online from NCBI and HapMap. Therefore, users have an overview of the LD analysis for the input SNPs without performing prior SNP genotyping or inputting the genotype file. In contrast, Haploview provides many SNPs and users must manually select SNPs of interest. If the SNPs of interest are distributed widely over the chromosome, the SNP panel contains a large number of SNPs. Haploview thus only indirectly provides LD analysis for multiple SNPs.
Tag SNP selection
Tag SNP selection candidates from different operation times in HapMap may not be consistent due to changes made in the built-in greedy algorithm. Some tag SNPs may or may not be found again in subsequent tests. For example, tag SNP selection by inputting gene BRCA2 to HapMap under MAF = 0.2 yields two tag SNP sets: 1) rs9534342, rs9943888, rs11571662, rs206120, rs206342, rs542551, rs9567552, rs206079, rs9562605, and rs14448 and 2) rs9534275, rs9943888, rs11571579, rs206146, rs206077, rs573014, rs9567552, rs9534174, rs144848, and rs9562605.
Restriction enzyme mining for RFLP
The LD2SNPing provides the SNP ID searching to online retrieval to dbSNP in NCBI for RFLP analysis. However, the RFLP analysis for SNP ID input may be unable to provide the restriction enzyme information due to the nature of SNP itself. For example, the sequence information for rs9943888 and rs11571579 are retrieved successfully in LD2SNPing but only rs11571579 has the suitable restriction enzymes to mine (not shown). This is the nature for the SNP itself but not the RFLP analysis system error. For the wet experiment of PCR-RFLP, the users need the primer design software such as the "Prim-SNPing"  for primer design for SNP-RFLP and "SNP-Flankplus"  for the retrieval of SNP flanking sequence for primer design.
LD2SNPing has the following characteristics: 1) it provides a search function for online retrieval of SNP information from dbSNP of NCBI; 2) it provides gene-centric tag SNP selection through online retrieval from HapMap; 3) all the SNP IDs and tag SNPs are processed to mine RFLP restriction enzymes for SNP genotype; 4) it provides LD measurements for D, D', r2, δ Q , and ρ, along with the P value of the Hardy-Weinberg Equilibrium for each SNP marker and the P values of the Chi-square and likelihood-ratio tests for the pair-wise association of two SNPs in LD calculation; 5) it accepts multiple SNP inputs to perform LD analysis by online retrieval from HapMap and NCBI; 6) it presents both 2D and 3D visualization with LD-related measurements shown on the graphs; 7) it provides both graphic and plain-text outputs for LD analysis. In conclusion, LD2SNPing is a novel and integrated visualisation software designed to provide the user with the tools necessary for genotyping and LD analysis. It provides a simple and user-friendly interface with integrated functions for retrieval of SNP information, LD statistical calculation, analysis and visualization.
Availability and requirements
Project name: LD2SNPing: Linkage disequilibrium plotter and RFLP enzyme mining for tag SNPs
Project home page: http://bio.kuas.edu.tw/LD2SNPing/ with software and user manual for download.
Operating system(s): Platform-independent
Programming language: Java
Other requirements: Java 1.5.0 or higher
License: Free for non-commercial use
Any restrictions to use by non-academics: Please contact corresponding author.
single nucleotide polymorphism
restriction fragment length polymorphism
Human Genome Organization
minor allele frequency.
This work was partly supported by the National Science Council in Taiwan under grants 97-2311-B-037-003-MY3, 96-2221-E-214-050-MY3, NSC96-2311-B037-002, 96-2622-E-151-019-CC3, NSC96-2622-E214-004-CC3, KMU-EM-97-1.1b, and KMU-EM-98-1.4.
- Shastry BS: SNPs in disease gene mapping, medicinal drug development and evolution. Journal of human genetics. 2007, 52 (11): 871-880. 10.1007/s10038-007-0200-z.View ArticlePubMedGoogle Scholar
- Zheng SL, Sun J, Wiklund F, Smith S, Stattin P, Li G, Adami HO, Hsu FC, Zhu Y, Balter K, et al: Cumulative association of five genetic variants with prostate cancer. N Engl J Med. 2008, 358 (9): 910-919. 10.1056/NEJMoa075819.View ArticlePubMedGoogle Scholar
- Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, Belmont JW, Boudreau A, Hardenbol P, Leal SM, et al: A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007, 449 (7164): 851-861. 10.1038/nature06258.View ArticlePubMedGoogle Scholar
- Carter K, Bellgard M: MASV–Multiple (BLAST) Annotation System Viewer. Bioinformatics (Oxford, England). 2003, 19 (17): 2313-2315. 10.1093/bioinformatics/btg301. [http://cbbc.murdoch.edu.au/projects/masv/]View ArticleGoogle Scholar
- Ding K, Zhou K, He F, Shen Y: LDA–a java-based linkage disequilibrium analyzer. Bioinformatics (Oxford, England). 2003, 19 (16): 2147-2148. 10.1093/bioinformatics/btg276. [http://www.chgb.org.cn/lda/lda.htm]View ArticleGoogle Scholar
- Barrett JC, Fry B, Maller J, Daly MJ: Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics (Oxford, England). 2005, 21 (2): 263-265. 10.1093/bioinformatics/bth457. [http://www.broad.mit.edu/mpg/haploview/]View ArticleGoogle Scholar
- Carter KW, McCaskie PA, Palmer LJ: JLIN: a java based linkage disequilibrium plotter. BMC bioinformatics. 2006, 7: 60-10.1186/1471-2105-7-60. [http://www.genepi.org.au/projects/jlin]PubMed CentralView ArticlePubMedGoogle Scholar
- Tobler AR, Short S, Andersen MR, Paner TM, Briggs JC, Lambert SM, Wu PP, Wang Y, Spoonde AY, Koehler RT, et al: The SNPlex genotyping system: a flexible and scalable platform for SNP genotyping. J Biomol Tech. 2005, 16 (4): 398-406.PubMed CentralPubMedGoogle Scholar
- Ding K, Zhang J, Zhou K, Shen Y, Zhang X: htSNPer1.0: software for haplotype block partition and htSNPs selection. BMC bioinformatics. 2005, 6: 38-10.1186/1471-2105-6-38. [http://www.chgb.org.cn/htSNPer/htSNPer.html]PubMed CentralView ArticlePubMedGoogle Scholar
- Chang HW, Yang CH, Chang PL, Cheng YH, Chuang LY: SNP-RFLPing: restriction enzyme mining for SNPs in genomes. BMC genomics. 2006, 7: 30-10.1186/1471-2164-7-30. [http://bio.kuas.edu.tw/snp-rflping/]PubMed CentralView ArticlePubMedGoogle Scholar
- Chuang LY, Yang CH, Cheng YH, Gu DL, Chang PL, Tsui KH, Chang HW: V-MitoSNP: visualization of human mitochondrial SNPs. BMC bioinformatics. 2006, 7: 379-10.1186/1471-2105-7-379. [http://bio.kuas.edu.tw/v-mitosnp/]PubMed CentralView ArticlePubMedGoogle Scholar
- Thorisson GA, Smith AV, Krishnan L, Stein LD: The International HapMap Project Web site. Genome research. 2005, 15 (11): 1592-1593. 10.1101/gr.4413105. [http://www.hapmap.org]PubMed CentralView ArticlePubMedGoogle Scholar
- Excoffier L, Slatkin M: Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Molecular biology and evolution. 1995, 12 (5): 921-927.PubMedGoogle Scholar
- Wigginton JE, Cutler DJ, Abecasis GR: A note on exact tests of Hardy-Weinberg equilibrium. American journal of human genetics. 2005, 76 (5): 887-893. 10.1086/429864.PubMed CentralView ArticlePubMedGoogle Scholar
- Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K: dbSNP: the NCBI database of genetic variation. Nucleic acids research. 2001, 29 (1): 308-311. 10.1093/nar/29.1.308. [http://www.ncbi.nlm.nih.gov/SNP/]PubMed CentralView ArticlePubMedGoogle Scholar
- Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, Dicuccio M, Edgar R, Federhen S: Database resources of the National Center for Biotechnology Information. Nucleic acids research. 2008, D13-21. [http://www.ncbi.nlm.nih.gov]36 Database
- Yang CH, Cheng YH, Chuang LY, Chang HW: SNP-Flankplus: SNP ID-centric retrieval for SNP flanking sequences. Bioinformation. 2008, 3 (4): 147-149. [http://bio.kuas.edu.tw/snp-flankplus/]PubMed CentralView ArticlePubMedGoogle Scholar
- Yang CH, Chuang LY, Cheng YH, Wen CH, Chang PL, Chang HW: SNP ID-info: SNP ID searching and visualization platform. OMICS. 2008, 12 (3): 217-226. 10.1089/omi.2008.0026. [http://bio.kuas.edu.tw/snpid-info]View ArticlePubMedGoogle Scholar
- Roberts RJ, Vincze T, Posfai J, Macelis D: REBASE–enzymes and genes for DNA restriction and modification. Nucleic acids research. 2007, D269-270. 10.1093/nar/gkl891. [http://www.rebase.org]35 Database
- Chang HW, Chuang LH, Cheng YH, Hung YC, Wen CH, Gu DL, Yang CH: Prim-SNPing: a primer designer for cost-effective SNP genotyping. Biotechniques. 2009, 46 (6): 421-431. 10.2144/000113092. [http://www.rebase.org]View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.