- Methodology article
- Open Access
Trend-TDT – a transmission/disequilibrium based association test on functional mini/microsatellites
© Feng et al; licensee BioMed Central Ltd. 2007
- Received: 04 May 2007
- Accepted: 01 November 2007
- Published: 01 November 2007
Minisatellites and microsatellites are associated with human disease, not only as markers of risk but also involved directly in disease pathogenesis. They may play significant roles in replication, repair and mutation of DNA, regulation of gene transcription and protein structure alteration. Phenotypes can thus be affected by mini/microsatellites in a manner proportional to the length of the allele. Here we propose a new method to assess the linear trend toward transmission of shorter or longer alleles from heterozygote parents to affected child.
This test (trend-TDT) performs better than other TDT (Transmission/Disequilibrium Test) type tests, such as TDTmax and TDTL/S, under most marker-disease association models.
The trend-TDT test is a more powerful association test when there is a biological basis to suspect a relationship between allele length and disease risk.
- Conditional Logistic Regression
- Short Allele
- Allele Length
- Longe Allele
- Equal Allele Frequency
Variable number tandem repeats (VNTR's) are repetitive DNA sequences widely dispersed in the human genome. They are highly unstable and thus display a remarkable degree of polymorphism. They vary in length from a few to several thousand nucleotides and vary in complexity from simple di-, tri- and tetra-nucleotide repeats (microsatellites) to more complex repetitive elements (minisatellites). VNTR's, mainly microsatellites, have assumed an increasingly important role as markers in the genome and are intensively exploited for gene mapping. But VNTR's could be associated with human disease, not only as markers but also directly involved in disease pathogenesis; indeed, several functions have been suggested for micro- and mini-satellite DNA sequences.
If located within a coding sequence, VNTR's may alter protein structure. For example, expansions of tri-nucleotide microsatellites are responsible for genetic diseases such as X-linked spinal and bulbar muscular atrophy, Huntington disease, type 1 spinocerebellar ataxia, dentatorubral-pallidoluysian atrophy, and Machado-Joseph disease. These diseases are caused by expansion of CAG triplets within protein-coding regions .
Micro/minisatellites that regulate gene transcription.
Length of alleles
Transcription regulation a
Interacting factor b
promoter -1 kb
promoter -2.5 kb
promoter, intron 1
promoter -595 bp
promoter -240 bp
promoter -400 bp
3':1 kb after polyA
28 bp repeat
promoter -596 bp
14 bp repeat
promoter -3.6 kb
43 bp repeat
The Transmission/Disequilibrium Test (TDT) is a popular method to assess the involvement of a candidate gene or a genome region in the genetic component of a disease, using cases and their parents. The TDT, as originally developed , tested the association between a bi-allelic marker and a disease. Many authors have proposed an extension of the TDT to multi-allelic markers, by testing each allele separately [8, 9]., by testing symmetry of the transmitted/non-transmitted table [10, 11], by testing marginal homogeneity [12, 13], or by conditional logistic regression [14, 15]. However, all these extensions considered implicitly the multi-allelic marker as a polymorphism without function, that is, the risk of disease was not treated as being correlated with allele repeat length. While this is true for most situations, there are some situations where the multi-allelic marker under study may have a functional effect on the studied disease, and thus this correlation may be present. This may introduce new information that can be taken into account in the test. From a statistical point of view, increased allele length could be understood as an increased dose of exposure to a risk factor. In contrast to case-control association studies where one can use the classical trend-chi-square (the Cochran-Armitage trend test) to test this hypothesis, available extensions of the TDT to multi-allelic markers do not test such a "dose effect" in family-based association studies. However, case-control studies can be subject to bias produced by hidden population stratification. Therefore, a new statistical method that can test the correlation of allele length with disease susceptibility, and is not sensitive to population stratification is needed. In this paper, we describe a newly developed method to meet this requirement.
asymptotically follows the Student's t distribution with N-1 degrees of freedom. Here S is the estimated standard deviation of the d f , and N is the number of informative families. In case there is a trend toward transmission of shorter alleles, the mean(d f ) will be less than 0, and vice versa. If biological clues indicate that preferential transmission of shorter alleles (or longer alleles) should be observed, the test is one-tailed t test (H1:T < 0 or H1:T > 0); otherwise the test is two-tailed (H1: T ≠ 0).
The missing genotype problem is treated according to Curtis . In case both parents are missing, or, one parent is missing and the affected child has the same heterozygote genotype as the other parent, these families are considered uninformative and are discarded in the analysis. When only one parent is missing but the affected child is homozygote, inclusion of such triads will lead to bias, therefore they are also discarded . In other situations, transmission status of either allele can be inferred, and they are used in the analysis.
Comparison with other methods
Here ni•denote the number of heterozygote parents who transmit an allele i, and n• i denote the number of heterozygote parents who has an allele i but do not transmit it. Individual TDT is calculated for all alleles, and the maximal value is taken as the TDTmax. Although the individual TDT test follows Chi-square distribution with 1 degree of freedom, the TDTmax does not. Clearly, this method will not have appropriate type I error due to the selection of the highest Chi-square value. Several methods have been proposed to address the multiple testing problem in TDTmax, including empirical p value simulation  and modified Bonferroni correction . Since the former method requires enormous number of repetitions to accurately obtain a low p value, in this study, Bonferroni corrected TDTmax is used and evaluated.
where b is the number of parents that transmit the long allele but not the short one, and c is the number of parents that transmit the short allele but not the long one. It should be noted that some of the heterozygote parents are not counted in the computation if both of their alleles belong to the long allele pool or short allele pool. The specific problem of this approach is the choice of the threshold between "long" and "short" alleles; here we choose the first allele (from shortest to longest) whose cumulative allele frequency is greater than 0.5, so that roughly half of the alleles are long alleles and another half the short ones. We note however that in some cases there be relevant biological data which might suggest a more appropriate threshold.
The cut-of thresholds to reject H0 hypothesis used in these two methods are the same as trend-TDT.
Type I error computations
In order to assess and compare the type I error rates of each of the three tests, we simulated 200 trios (case and both parents) with disease-unrelated microsatellite genotypes. The total number of alleles of this marker is set to 10, with equal allele frequencies. Simulations are performed 1,000,000 times. The proportion of times that calculated p-value is equal to or less than an expected value is plotted against this expected value, in minus logarithm scale. For a correct test statistic, this curve should be exactly the line "y = x". For a test with higher type I error rate, the curve will be bellow the line "y = x", and for a conservative test, the curve lies above.
Modeling genotyping errors
The most common genotyping errors in microsatellites were simulated to evaluate their effects on type I error rate of the trend-TDT test. These errors include confusing homozygote and adjacent-allele-heterozygote genotypes in allele banding pattern scoring , false homozygotes due to the preferential amplification of shorter alleles over longer alleles (short allele dominance), false homozygotes due to priming site mutations (null allele), offspring gaining one more repeat unit in one of the alleles (microsatellite mutation), and randomly mis-scoring an allele as its adjacent allele due to binning error. In simulation, each of these genotyping error rates was moderately higher than what is usually discovered in real data . The microsatellite was simulated with 10 equally distributed alleles, without association with disease. Type I error rates were then calculated as the proportion of times trend-TDT yielding significant results (p ≤ 0.05) from 1,000,000 simulations on 200 trios.
Power can be estimated by generating samples with a determined pattern of marker-disease association, and by calculating the proportion of these simulations that the null hypothesis is correctly rejected. Here in this paper, we assume a significance level of 0.001. Following this design, we evaluate the power of the trend-TDT and compare it with the power of two other TDT tests: TDTmax and TDTL/S.
Alleles frequencies and allelic relative risks in power simulation.
Equal allele frequencies.
Randomized allele frequencies.
There exist two major alleles.
RRs increase linearly along with allele length.
RRs increase above a threshold of allele length.
Modeling non-functional markers
Situations when VNTR markers are associated with a disease, without linear correlation between allele length and disease risk, are also modeled. In this model, the VNTR marker has 10 alleles, with allele frequencies equally distributed. Relative risks are assigned proportional to allele length, then before each repeat of the simulation, this relative risk vector is permuted. Empirical power is calculated to compare the performance of the statistics before and after permutation, based on 10,000 repeats of simulations on 200 trios.
A computer program for the trend-TDT, TDTmax, and TDTL/S test is written and can be downloaded .
Type I error
Simulated genotyping errors and resultant type I error rates.
Error Models §
Mistypes in total genotypes (%)
Misinheritance in mistyped trios (%)
Type I Err. (p ≤ 0.05) Rate (95% C.I.)
The behavior of the tests under different marker-disease association models is presented in Figure 4. These models are defined so that relative risks increased linearly ("RR(lin)") or uniformly above a threshold ("RR(thr3)", "RR(thr4)", "RR(thr5)"), according to the increase in VNTR length. The assumed marker is a microsatellite with six equally frequent alleles. In the threshold models, the thresholds for higher relative risk are set to allele 3 ("RR(thr3)"), allele 4 ("RR(thr4)"), or allele 5 ("RR(thr5)"). As shown in Figure 4, the trend-TDT is the most powerful method under the linear model, while under threshold models, the relative performance depends on where the threshold is. When the threshold is close to the shortest or longest allele, the trend-TDT performed much better than TDTL/S. When the threshold is exactly in the middle, which is most favorable to TDTL/S, the TDTL/S is better. However, in this case both the trend-TDT and TDTL/S have high power and the difference is very small (Figure 4). If the threshold can be inferred by biologic knowledge of the gene under study, then using the known threshold will lead to much higher power in TDTL/S than the trend-TDT (Figure 4). Under most circumstances, TDTmax performed the worst among the tested methods (Figure 2, 3, 4), with the only exception that in the RR(thr3) model in Figure 4, TDTmax is better than TDTL/S.
Performance of the tests
As expected, when the relative risks increase proportionally with allele length, the trend-TDT is always more powerful than the other tests, irrespective of the number of alleles or their frequencies. When the RRs increase according to a threshold model, the performances of TDTL/S and trend-TDT depend on the threshold. TDTL/S is more sensitive to the threshold and less powerful when the threshold is close to the longest or shortest allele. When the threshold is close to medium allele length, TDTL/S performs slightly better than the trend-TDT, but both are quite powerful in this situation. The TDTmax performs the worst in most situations studied here. This may be because both trend-TDT and TDTL/S use the information on the correlation between allele length and disease risk that is present in the generated disease model.
Choice of the tests
Based on these results, we do not recommend the TDTmax for any situation when there could be a relationship between allele length and disease risk. Whether to use trend-TDT or TDTL/S depends on prior knowledge of the functional relationship between allele length and gene function. When the threshold model is biologically true, and this threshold can be inferred by biologic knowledge of the gene under study, then TDTL/S is a better choice. Under all other situations, trend-TDT is recommended. When the threshold model is true but it is not clear where the threshold is, trend-TDT should be used, since by using TDTL/S, one either has a multiple testing problem by trying different thresholds, or alternatively has less power for the test by using the median allele length only, which could be wrong biologically. Even when the true threshold is close to the median allele length, the difference between trend-TDT and TDTL/S is so small that it could be ignored. In other situations when a VNTR is associated with a disease without trend, trend-TDT and TDTL/S are not as powerful, therefore other TDT methods should be used.
Another potential transmission/disequilibrium based test that could take into account the phenotypic response trend toward longer or shorter alleles is conditional logistic regression [20, 21], using a continuous variable for the allele length rather than a categorical one. Preliminary simulations indicate that this test is not as powerful as the trend-TDT test (data not shown); nevertheless, conditional logistic regression could be more beneficial, since it can incorporate various genetic risk models, include other genetic or environmental risk factors, and provide estimates of the risk of the disease conferred by the functional micro/minisatellite. Therefore, both methods might be used depending on the particular study circumstances.
Impact of genotyping errors
Given that genotyping errors may lead to increased type I error rates of TDT tests, several modified TDT statistics were proposed for analysis of single nucleotide polymorphisms [22–26], since it is much easier to model genotyping errors in bi-allelic markers than in multi-allelic markers. It was expected that genotyping errors would also increase the type I error rate of the trend-TDT test. However, simulation has shown that, with reasonable typing error frequencies, the type I error rates were inflated only slightly. The reason might be that genotyping errors in multi-allelic markers can be efficiently detected by Mendelian-inheritance analysis when parental data are available . It should be noted that the extent of type I error is a function of the typing error frequencies, the number of alleles, the allele frequencies, and sample size [23, 28]. Thus, if genotyping errors are observed in a subset of a larger sample of pedigrees (e.g., over 500 affected offspring), statistical methods to address genotyping errors in TDT analysis should be considered to confirm that significant results are not false positives due to undetected genotyping errors. To further eliminate genotyping errors in real data analysis, it is recommended that siblings of the patients are genotyped and/or closely adjacent markers are genotyped, so that more typing errors can be detected as either Mendelian inconsistencies in the former or haplotype double crossovers in the latter.
In summary, we have developed a new statistical test, the trend-TDT test, appropriate for those situations when a) parental data are available; and b) there are multiple alleles at the marker locus hypothesized to be associated with the disease of interest; and, most importantly, c) there is a biological basis to suspect a relationship between allele length and disease risk.
We thank Dr Bruno Fallisard for his scientific input in the design of the test.
- Ashley CT, Warren ST: Trinucleotide repeat expansion and human disease. Annu Rev Genet. 1995, 29: 703-728. 10.1146/annurev.ge.29.120195.003415.View ArticlePubMedGoogle Scholar
- Kashi Y, King D, Soller M: Simple sequence repeats as a source of quantitative genetic variation. Trends Genet. 1997, 13 (2): 74-78. 10.1016/S0168-9525(97)01008-1.View ArticlePubMedGoogle Scholar
- Kennedy GC, German MS, Rutter WJ: The minisatellite in the diabetes susceptibility locus IDDM2 regulates insulin transcription. Nat Genet. 1995, 9 (3): 293-298. 10.1038/ng0395-293.View ArticlePubMedGoogle Scholar
- Comings DE: Polygenic inheritance and micro/minisatellites. Mol Psychiatry. 1998, 3 (1): 21-31. 10.1038/sj.mp.4000289.View ArticlePubMedGoogle Scholar
- Gatchel JR, Zoghbi HY: Diseases of unstable repeat expansion: mechanisms and common principles. Nat Rev Genet. 2005, 6 (10): 743-755. 10.1038/nrg1691.View ArticlePubMedGoogle Scholar
- Greene E, Handa V, Kumari D, Usdin K: Transcription defects induced by repeat expansion: fragile X syndrome, FRAXE mental retardation, progressive myoclonus epilepsy type 1, and Friedreich ataxia. Cytogenet Genome Res. 2003, 100 (1-4): 65-76. 10.1159/000072839.View ArticlePubMedGoogle Scholar
- Spielman RS, McGinnis RE, Ewens WJ: Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am J Hum Genet. 1993, 52 (3): 506-516.PubMed CentralPubMedGoogle Scholar
- Betensky RA, Rabinowitz D: Simple approximations for the maximal transmission/disequilibrium test with a multi-allelic marker. Ann Hum Genet. 2000, 64 (Pt 6): 567-574. 10.1046/j.1469-1809.2000.6460567.x.View ArticlePubMedGoogle Scholar
- Morris AP, Curnow RN, Whittaker JC: Randomization tests of disease-marker associations. Ann Hum Genet. 1997, 61 ((Pt 1)): 49-60.PubMedGoogle Scholar
- Cleves MA, Olson JM, Jacobs KB: Exact transmission-disequilibrium tests with multiallelic markers. Genet Epidemiol. 1997, 14 (4): 337-347. 10.1002/(SICI)1098-2272(1997)14:4<337::AID-GEPI1>3.0.CO;2-0.View ArticlePubMedGoogle Scholar
- Lazzeroni LC, Lange K: A conditional inference framework for extending the transmission/disequilibrium test. Hum Hered. 1998, 48 (2): 67-81. 10.1159/000022784.View ArticlePubMedGoogle Scholar
- Bickeboller H, Clerget-Darpoux F: Statistical properties of the allelic and genotypic transmission/disequilibrium test for multiallelic markers. Genet Epidemiol. 1995, 12 (6): 865-870. 10.1002/gepi.1370120656.View ArticlePubMedGoogle Scholar
- Spielman RS, Ewens WJ: The TDT and other family-based tests for linkage disequilibrium and association. Am J Hum Genet. 1996, 59 (5): 983-989.PubMed CentralPubMedGoogle Scholar
- Harley JB, Moser KL, Neas BR: Logistic transmission modeling of simulated data. Genet Epidemiol. 1995, 12 (6): 607-612. 10.1002/gepi.1370120614.View ArticlePubMedGoogle Scholar
- Sham PC, Curtis D: An extended transmission/disequilibrium test (TDT) for multi-allele marker loci. Ann Hum Genet. 1995, 59 ((Pt 3)): 323-336. 10.1111/j.1469-1809.1995.tb00751.x.View ArticlePubMedGoogle Scholar
- Curtis D, Sham PC: A note on the application of the transmission disequilibrium test when a parent is missing. Am J Hum Genet. 1995, 56 (3): 811-812.PubMed CentralPubMedGoogle Scholar
- Hoffman JI, Amos W: Microsatellite genotyping errors: detection approaches, common sources and consequences for paternal exclusion. Mol Ecol. 2005, 14 (2): 599-612. 10.1111/j.1365-294X.2004.02419.x.View ArticlePubMedGoogle Scholar
- Ewen KR, Bahlo M, Treloar SA, Levinson DF, Mowry B, Barlow JW, Foote SJ: Identification and analysis of error types in high-throughput genotyping. Am J Hum Genet. 2000, 67 (3): 727-736. 10.1086/303048.PubMed CentralView ArticlePubMedGoogle Scholar
- Feng BJ: trendTDT version 1.0. [http://geocities.com/trntdt/]1.0
- Schaid DJ: General score tests for associations of genetic markers with disease using cases and their parents. Genet Epidemiol. 1996, 13 (5): 423-449. 10.1002/(SICI)1098-2272(1996)13:5<423::AID-GEPI1>3.0.CO;2-3.View ArticlePubMedGoogle Scholar
- Cordell HJ, Clayton DG: A unified stepwise regression procedure for evaluating the relative effects of polymorphisms within a gene using case/control or family data: application to HLA in type 1 diabetes. Am J Hum Genet. 2002, 70 (1): 124-141. 10.1086/338007.PubMed CentralView ArticlePubMedGoogle Scholar
- Morris RW, Kaplan NL: Testing for association with a case-parents design in the presence of genotyping errors. Genet Epidemiol. 2004, 26 (2): 142-154. 10.1002/gepi.10297.View ArticlePubMedGoogle Scholar
- Gordon D, Heath SC, Liu X, Ott J: A transmission/disequilibrium test that allows for genotyping errors in the analysis of single-nucleotide polymorphism data. Am J Hum Genet. 2001, 69 (2): 371-380. 10.1086/321981.PubMed CentralView ArticlePubMedGoogle Scholar
- Gordon D, Haynes C, Johnnidis C, Patel SB, Bowcock AM, Ott J: A transmission disequilibrium test for general pedigrees that is robust to the presence of random genotyping errors and any number of untyped parents. Eur J Hum Genet. 2004, 12 (9): 752-761. 10.1038/sj.ejhg.5201219.PubMed CentralView ArticlePubMedGoogle Scholar
- Cheng KF, Chen JH: A simple and robust TDT-type test against genotyping error with error rates varying across families. Hum Hered. 2007, 64 (2): 114-122. 10.1159/000101963.View ArticlePubMedGoogle Scholar
- Bernardinelli L, Berzuini C, Seaman S, Holmans P: Bayesian trio models for association in the presence of genotyping errors. Genet Epidemiol. 2004, 26 (1): 70-80. 10.1002/gepi.10291.View ArticlePubMedGoogle Scholar
- Douglas JA, Skol AD, Boehnke M: Probability of detection of genotyping errors and mutations as inheritance inconsistencies in nuclear-family data. Am J Hum Genet. 2002, 70 (2): 487-495. 10.1086/338919.PubMed CentralView ArticlePubMedGoogle Scholar
- Mitchell AA, Cutler DJ, Chakravarti A: Undetected genotyping errors cause apparent overtransmission of common alleles in the transmission/disequilibrium test. Am J Hum Genet. 2003, 72 (3): 598-610. 10.1086/368203.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.