- Methodology article
- Open Access
Trend-TDT – a transmission/disequilibrium based association test on functional mini/microsatellites
BMC Genetics volume 8, Article number: 75 (2007)
Minisatellites and microsatellites are associated with human disease, not only as markers of risk but also involved directly in disease pathogenesis. They may play significant roles in replication, repair and mutation of DNA, regulation of gene transcription and protein structure alteration. Phenotypes can thus be affected by mini/microsatellites in a manner proportional to the length of the allele. Here we propose a new method to assess the linear trend toward transmission of shorter or longer alleles from heterozygote parents to affected child.
This test (trend-TDT) performs better than other TDT (Transmission/Disequilibrium Test) type tests, such as TDTmax and TDTL/S, under most marker-disease association models.
The trend-TDT test is a more powerful association test when there is a biological basis to suspect a relationship between allele length and disease risk.
Variable number tandem repeats (VNTR's) are repetitive DNA sequences widely dispersed in the human genome. They are highly unstable and thus display a remarkable degree of polymorphism. They vary in length from a few to several thousand nucleotides and vary in complexity from simple di-, tri- and tetra-nucleotide repeats (microsatellites) to more complex repetitive elements (minisatellites). VNTR's, mainly microsatellites, have assumed an increasingly important role as markers in the genome and are intensively exploited for gene mapping. But VNTR's could be associated with human disease, not only as markers but also directly involved in disease pathogenesis; indeed, several functions have been suggested for micro- and mini-satellite DNA sequences.
If located within a coding sequence, VNTR's may alter protein structure. For example, expansions of tri-nucleotide microsatellites are responsible for genetic diseases such as X-linked spinal and bulbar muscular atrophy, Huntington disease, type 1 spinocerebellar ataxia, dentatorubral-pallidoluysian atrophy, and Machado-Joseph disease. These diseases are caused by expansion of CAG triplets within protein-coding regions .
VNTR's may also regulate gene transcription. Numerous in vitro studies have shown that gene transcription may be increased or decreased proportionally to the number of repeated sequences (i.e. length of alleles) as illustrated in Table 1 (for detailed review, see Kashi et al. ). Direct effect of transcriptional modulation on risk of disease has been observed. As an example, the minisatellite ILPR (Insulin-Linked Polymorphic Region, (ACAGGGGTGTGGGG)n) located 5' of the Insulin gene is implicated in Insulin-Dependent Diabetes Mellitus . To date, many transcriptional factors have been identified and their binding with minisatellite repeated sequences have been demonstrated. There is increasing evidence that some gene-disease associations are due to functional micro/minisatellites, with the magnitude of susceptibility being related to allele length [4–6].
The Transmission/Disequilibrium Test (TDT) is a popular method to assess the involvement of a candidate gene or a genome region in the genetic component of a disease, using cases and their parents. The TDT, as originally developed , tested the association between a bi-allelic marker and a disease. Many authors have proposed an extension of the TDT to multi-allelic markers, by testing each allele separately [8, 9]., by testing symmetry of the transmitted/non-transmitted table [10, 11], by testing marginal homogeneity [12, 13], or by conditional logistic regression [14, 15]. However, all these extensions considered implicitly the multi-allelic marker as a polymorphism without function, that is, the risk of disease was not treated as being correlated with allele repeat length. While this is true for most situations, there are some situations where the multi-allelic marker under study may have a functional effect on the studied disease, and thus this correlation may be present. This may introduce new information that can be taken into account in the test. From a statistical point of view, increased allele length could be understood as an increased dose of exposure to a risk factor. In contrast to case-control association studies where one can use the classical trend-chi-square (the Cochran-Armitage trend test) to test this hypothesis, available extensions of the TDT to multi-allelic markers do not test such a "dose effect" in family-based association studies. However, case-control studies can be subject to bias produced by hidden population stratification. Therefore, a new statistical method that can test the correlation of allele length with disease susceptibility, and is not sensitive to population stratification is needed. In this paper, we describe a newly developed method to meet this requirement.
Consider a multi-allelic marker with k alleles, which are assumed to be coded as integers proportional to their length. The trend-TDT statistic is based on the length of alleles transmitted from heterozygous parents to their affected children. Let's denote, for each heterozygous parent i, t i the length of the transmitted allele, u i the length of the untransmitted allele, and x i the difference between the length of transmitted and untransmitted alleles (x i = t i -u i ). For family f, let n f be the number of calculated x i within the family, and define d f as
Under the situation that neither the micro/minisatellite is the cause of the disease, nor is it in linkage disequilibrium with any disease causing genes, then the mean of d f should be zero, and its variance is
Note that this d f is actually the mean of x i weighted by square root of n f , so that the variance of d f is equal between families. Hence the test statistic
asymptotically follows the Student's t distribution with N-1 degrees of freedom. Here S is the estimated standard deviation of the d f , and N is the number of informative families. In case there is a trend toward transmission of shorter alleles, the mean(d f ) will be less than 0, and vice versa. If biological clues indicate that preferential transmission of shorter alleles (or longer alleles) should be observed, the test is one-tailed t test (H1:T < 0 or H1:T > 0); otherwise the test is two-tailed (H1: T ≠ 0).
The missing genotype problem is treated according to Curtis . In case both parents are missing, or, one parent is missing and the affected child has the same heterozygote genotype as the other parent, these families are considered uninformative and are discarded in the analysis. When only one parent is missing but the affected child is homozygote, inclusion of such triads will lead to bias, therefore they are also discarded . In other situations, transmission status of either allele can be inferred, and they are used in the analysis.
Comparison with other methods
Two other methods that can be used in testing association between disease and functional micro/minisatellites are TDTmax and TDTL/S. TDTmax stems from the classical bi-allelic TDT. The statistics corresponding to TDTmax is the maximum chi-square value obtained over all alleles:
Here ni•denote the number of heterozygote parents who transmit an allele i, and n• i denote the number of heterozygote parents who has an allele i but do not transmit it. Individual TDT is calculated for all alleles, and the maximal value is taken as the TDTmax. Although the individual TDT test follows Chi-square distribution with 1 degree of freedom, the TDTmax does not. Clearly, this method will not have appropriate type I error due to the selection of the highest Chi-square value. Several methods have been proposed to address the multiple testing problem in TDTmax, including empirical p value simulation  and modified Bonferroni correction . Since the former method requires enormous number of repetitions to accurately obtain a low p value, in this study, Bonferroni corrected TDTmax is used and evaluated.
TDTL/S corresponds to the classical bi-allelic TDT computed on collapsed long alleles vs. collapsed short alleles. In this case, the traditional TDT statistics can be used:
where b is the number of parents that transmit the long allele but not the short one, and c is the number of parents that transmit the short allele but not the long one. It should be noted that some of the heterozygote parents are not counted in the computation if both of their alleles belong to the long allele pool or short allele pool. The specific problem of this approach is the choice of the threshold between "long" and "short" alleles; here we choose the first allele (from shortest to longest) whose cumulative allele frequency is greater than 0.5, so that roughly half of the alleles are long alleles and another half the short ones. We note however that in some cases there be relevant biological data which might suggest a more appropriate threshold.
The cut-of thresholds to reject H0 hypothesis used in these two methods are the same as trend-TDT.
Type I error computations
In order to assess and compare the type I error rates of each of the three tests, we simulated 200 trios (case and both parents) with disease-unrelated microsatellite genotypes. The total number of alleles of this marker is set to 10, with equal allele frequencies. Simulations are performed 1,000,000 times. The proportion of times that calculated p-value is equal to or less than an expected value is plotted against this expected value, in minus logarithm scale. For a correct test statistic, this curve should be exactly the line "y = x". For a test with higher type I error rate, the curve will be bellow the line "y = x", and for a conservative test, the curve lies above.
Modeling genotyping errors
The most common genotyping errors in microsatellites were simulated to evaluate their effects on type I error rate of the trend-TDT test. These errors include confusing homozygote and adjacent-allele-heterozygote genotypes in allele banding pattern scoring , false homozygotes due to the preferential amplification of shorter alleles over longer alleles (short allele dominance), false homozygotes due to priming site mutations (null allele), offspring gaining one more repeat unit in one of the alleles (microsatellite mutation), and randomly mis-scoring an allele as its adjacent allele due to binning error. In simulation, each of these genotyping error rates was moderately higher than what is usually discovered in real data . The microsatellite was simulated with 10 equally distributed alleles, without association with disease. Type I error rates were then calculated as the proportion of times trend-TDT yielding significant results (p ≤ 0.05) from 1,000,000 simulations on 200 trios.
Power can be estimated by generating samples with a determined pattern of marker-disease association, and by calculating the proportion of these simulations that the null hypothesis is correctly rejected. Here in this paper, we assume a significance level of 0.001. Following this design, we evaluate the power of the trend-TDT and compare it with the power of two other TDT tests: TDTmax and TDTL/S.
The powers of the three tests were evaluated under different patterns of marker-disease association, parameterized in terms of relative-risk, and under different kinds of multi-allelic markers in terms of the number of alleles and allele frequencies. The different models are presented in Table 2. In these models, the maximum relative risk for any single allele size is always equal to 3, and the prevalence of the disease is fixed at 10%. Calculation of genotype-wide penetrance is based on multiplicative model. All estimates of power were based on 10,000 generated tests on 200 trios, unless otherwise specified.
Modeling non-functional markers
Situations when VNTR markers are associated with a disease, without linear correlation between allele length and disease risk, are also modeled. In this model, the VNTR marker has 10 alleles, with allele frequencies equally distributed. Relative risks are assigned proportional to allele length, then before each repeat of the simulation, this relative risk vector is permuted. Empirical power is calculated to compare the performance of the statistics before and after permutation, based on 10,000 repeats of simulations on 200 trios.
A computer program for the trend-TDT, TDTmax, and TDTL/S test is written and can be downloaded .
Type I error
As shown in Figure 1, the curve for both trend-TDT and TDTL/S are very close to the diagonal line, showing correct type I error rates in simulation. After Bonferroni correction, the type I error rate of TDTmax is nearly correct, although it is still a little conservative. As shown in Table 3, genotyping errors lead to slightly inflated type I error rates for trend-TDT.
The power of the three tests, trend-TDT, TDTL/S and TDTmax on simulated trios are plotted in Figures 2, 3, 4. Figure 2 presents the power of the tests under different VNTR/STR models, which vary in terms of the number of alleles at the VNTR (4, 6 or 10 alleles with equal allele frequencies). In each of these models, the relative risk associated with each allele increases linearly with the length of the allele. The trend-TDT is clearly the most powerful test in all situations. An increase in the number of alleles resulted in decreased power for all tests; however, the trend-TDT was the least sensitive to this effect. Figure 3 presents the behavior of the tests under different sets of allele frequencies, assuming a linear relative risk model of the simulated functional VNTR. It can be seen from the figure that the power is higher when the allele frequencies are equally distributed, and is lower when some major alleles exist. This is probably related to the fact that overall heterozygosity (and thus informativeness of the sample) is maximized with equal allele frequencies. Nevertheless, the simulations indicate that the trend-TDT is the least sensitive to the distribution of allele frequencies and is the most powerful for association detection among the three methods.
The behavior of the tests under different marker-disease association models is presented in Figure 4. These models are defined so that relative risks increased linearly ("RR(lin)") or uniformly above a threshold ("RR(thr3)", "RR(thr4)", "RR(thr5)"), according to the increase in VNTR length. The assumed marker is a microsatellite with six equally frequent alleles. In the threshold models, the thresholds for higher relative risk are set to allele 3 ("RR(thr3)"), allele 4 ("RR(thr4)"), or allele 5 ("RR(thr5)"). As shown in Figure 4, the trend-TDT is the most powerful method under the linear model, while under threshold models, the relative performance depends on where the threshold is. When the threshold is close to the shortest or longest allele, the trend-TDT performed much better than TDTL/S. When the threshold is exactly in the middle, which is most favorable to TDTL/S, the TDTL/S is better. However, in this case both the trend-TDT and TDTL/S have high power and the difference is very small (Figure 4). If the threshold can be inferred by biologic knowledge of the gene under study, then using the known threshold will lead to much higher power in TDTL/S than the trend-TDT (Figure 4). Under most circumstances, TDTmax performed the worst among the tested methods (Figure 2, 3, 4), with the only exception that in the RR(thr3) model in Figure 4, TDTmax is better than TDTL/S.
When markers are associated with the studied trait, but without a specific trend, the power of TDTmax remains unchanged, while the power of both the trend-TDT and TDTL/S decrease markedly (Figure 5). Notably, the trend-TDT and TDTL/S still have some power for association detection. In-depth study of each replicate of the simulation found that the power depends on the trend of the increase/decrease of the relative risk vector: in the most extreme cases where the trend is almost zero, the power of these two tests are equal to type I error rates; however, because in most cases, the trend is not zero, the power of trend-TDT and TDTL/S remain above the type I error level.
Performance of the tests
As expected, when the relative risks increase proportionally with allele length, the trend-TDT is always more powerful than the other tests, irrespective of the number of alleles or their frequencies. When the RRs increase according to a threshold model, the performances of TDTL/S and trend-TDT depend on the threshold. TDTL/S is more sensitive to the threshold and less powerful when the threshold is close to the longest or shortest allele. When the threshold is close to medium allele length, TDTL/S performs slightly better than the trend-TDT, but both are quite powerful in this situation. The TDTmax performs the worst in most situations studied here. This may be because both trend-TDT and TDTL/S use the information on the correlation between allele length and disease risk that is present in the generated disease model.
Choice of the tests
Based on these results, we do not recommend the TDTmax for any situation when there could be a relationship between allele length and disease risk. Whether to use trend-TDT or TDTL/S depends on prior knowledge of the functional relationship between allele length and gene function. When the threshold model is biologically true, and this threshold can be inferred by biologic knowledge of the gene under study, then TDTL/S is a better choice. Under all other situations, trend-TDT is recommended. When the threshold model is true but it is not clear where the threshold is, trend-TDT should be used, since by using TDTL/S, one either has a multiple testing problem by trying different thresholds, or alternatively has less power for the test by using the median allele length only, which could be wrong biologically. Even when the true threshold is close to the median allele length, the difference between trend-TDT and TDTL/S is so small that it could be ignored. In other situations when a VNTR is associated with a disease without trend, trend-TDT and TDTL/S are not as powerful, therefore other TDT methods should be used.
Another potential transmission/disequilibrium based test that could take into account the phenotypic response trend toward longer or shorter alleles is conditional logistic regression [20, 21], using a continuous variable for the allele length rather than a categorical one. Preliminary simulations indicate that this test is not as powerful as the trend-TDT test (data not shown); nevertheless, conditional logistic regression could be more beneficial, since it can incorporate various genetic risk models, include other genetic or environmental risk factors, and provide estimates of the risk of the disease conferred by the functional micro/minisatellite. Therefore, both methods might be used depending on the particular study circumstances.
Impact of genotyping errors
Given that genotyping errors may lead to increased type I error rates of TDT tests, several modified TDT statistics were proposed for analysis of single nucleotide polymorphisms [22–26], since it is much easier to model genotyping errors in bi-allelic markers than in multi-allelic markers. It was expected that genotyping errors would also increase the type I error rate of the trend-TDT test. However, simulation has shown that, with reasonable typing error frequencies, the type I error rates were inflated only slightly. The reason might be that genotyping errors in multi-allelic markers can be efficiently detected by Mendelian-inheritance analysis when parental data are available . It should be noted that the extent of type I error is a function of the typing error frequencies, the number of alleles, the allele frequencies, and sample size [23, 28]. Thus, if genotyping errors are observed in a subset of a larger sample of pedigrees (e.g., over 500 affected offspring), statistical methods to address genotyping errors in TDT analysis should be considered to confirm that significant results are not false positives due to undetected genotyping errors. To further eliminate genotyping errors in real data analysis, it is recommended that siblings of the patients are genotyped and/or closely adjacent markers are genotyped, so that more typing errors can be detected as either Mendelian inconsistencies in the former or haplotype double crossovers in the latter.
In summary, we have developed a new statistical test, the trend-TDT test, appropriate for those situations when a) parental data are available; and b) there are multiple alleles at the marker locus hypothesized to be associated with the disease of interest; and, most importantly, c) there is a biological basis to suspect a relationship between allele length and disease risk.
Ashley CT, Warren ST: Trinucleotide repeat expansion and human disease. Annu Rev Genet. 1995, 29: 703-728. 10.1146/annurev.ge.29.120195.003415.
Kashi Y, King D, Soller M: Simple sequence repeats as a source of quantitative genetic variation. Trends Genet. 1997, 13 (2): 74-78. 10.1016/S0168-9525(97)01008-1.
Kennedy GC, German MS, Rutter WJ: The minisatellite in the diabetes susceptibility locus IDDM2 regulates insulin transcription. Nat Genet. 1995, 9 (3): 293-298. 10.1038/ng0395-293.
Comings DE: Polygenic inheritance and micro/minisatellites. Mol Psychiatry. 1998, 3 (1): 21-31. 10.1038/sj.mp.4000289.
Gatchel JR, Zoghbi HY: Diseases of unstable repeat expansion: mechanisms and common principles. Nat Rev Genet. 2005, 6 (10): 743-755. 10.1038/nrg1691.
Greene E, Handa V, Kumari D, Usdin K: Transcription defects induced by repeat expansion: fragile X syndrome, FRAXE mental retardation, progressive myoclonus epilepsy type 1, and Friedreich ataxia. Cytogenet Genome Res. 2003, 100 (1-4): 65-76. 10.1159/000072839.
Spielman RS, McGinnis RE, Ewens WJ: Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am J Hum Genet. 1993, 52 (3): 506-516.
Betensky RA, Rabinowitz D: Simple approximations for the maximal transmission/disequilibrium test with a multi-allelic marker. Ann Hum Genet. 2000, 64 (Pt 6): 567-574. 10.1046/j.1469-1809.2000.6460567.x.
Morris AP, Curnow RN, Whittaker JC: Randomization tests of disease-marker associations. Ann Hum Genet. 1997, 61 ((Pt 1)): 49-60.
Cleves MA, Olson JM, Jacobs KB: Exact transmission-disequilibrium tests with multiallelic markers. Genet Epidemiol. 1997, 14 (4): 337-347. 10.1002/(SICI)1098-2272(1997)14:4<337::AID-GEPI1>3.0.CO;2-0.
Lazzeroni LC, Lange K: A conditional inference framework for extending the transmission/disequilibrium test. Hum Hered. 1998, 48 (2): 67-81. 10.1159/000022784.
Bickeboller H, Clerget-Darpoux F: Statistical properties of the allelic and genotypic transmission/disequilibrium test for multiallelic markers. Genet Epidemiol. 1995, 12 (6): 865-870. 10.1002/gepi.1370120656.
Spielman RS, Ewens WJ: The TDT and other family-based tests for linkage disequilibrium and association. Am J Hum Genet. 1996, 59 (5): 983-989.
Harley JB, Moser KL, Neas BR: Logistic transmission modeling of simulated data. Genet Epidemiol. 1995, 12 (6): 607-612. 10.1002/gepi.1370120614.
Sham PC, Curtis D: An extended transmission/disequilibrium test (TDT) for multi-allele marker loci. Ann Hum Genet. 1995, 59 ((Pt 3)): 323-336. 10.1111/j.1469-1809.1995.tb00751.x.
Curtis D, Sham PC: A note on the application of the transmission disequilibrium test when a parent is missing. Am J Hum Genet. 1995, 56 (3): 811-812.
Hoffman JI, Amos W: Microsatellite genotyping errors: detection approaches, common sources and consequences for paternal exclusion. Mol Ecol. 2005, 14 (2): 599-612. 10.1111/j.1365-294X.2004.02419.x.
Ewen KR, Bahlo M, Treloar SA, Levinson DF, Mowry B, Barlow JW, Foote SJ: Identification and analysis of error types in high-throughput genotyping. Am J Hum Genet. 2000, 67 (3): 727-736. 10.1086/303048.
Feng BJ: trendTDT version 1.0. [http://geocities.com/trntdt/]1.0
Schaid DJ: General score tests for associations of genetic markers with disease using cases and their parents. Genet Epidemiol. 1996, 13 (5): 423-449. 10.1002/(SICI)1098-2272(1996)13:5<423::AID-GEPI1>3.0.CO;2-3.
Cordell HJ, Clayton DG: A unified stepwise regression procedure for evaluating the relative effects of polymorphisms within a gene using case/control or family data: application to HLA in type 1 diabetes. Am J Hum Genet. 2002, 70 (1): 124-141. 10.1086/338007.
Morris RW, Kaplan NL: Testing for association with a case-parents design in the presence of genotyping errors. Genet Epidemiol. 2004, 26 (2): 142-154. 10.1002/gepi.10297.
Gordon D, Heath SC, Liu X, Ott J: A transmission/disequilibrium test that allows for genotyping errors in the analysis of single-nucleotide polymorphism data. Am J Hum Genet. 2001, 69 (2): 371-380. 10.1086/321981.
Gordon D, Haynes C, Johnnidis C, Patel SB, Bowcock AM, Ott J: A transmission disequilibrium test for general pedigrees that is robust to the presence of random genotyping errors and any number of untyped parents. Eur J Hum Genet. 2004, 12 (9): 752-761. 10.1038/sj.ejhg.5201219.
Cheng KF, Chen JH: A simple and robust TDT-type test against genotyping error with error rates varying across families. Hum Hered. 2007, 64 (2): 114-122. 10.1159/000101963.
Bernardinelli L, Berzuini C, Seaman S, Holmans P: Bayesian trio models for association in the presence of genotyping errors. Genet Epidemiol. 2004, 26 (1): 70-80. 10.1002/gepi.10291.
Douglas JA, Skol AD, Boehnke M: Probability of detection of genotyping errors and mutations as inheritance inconsistencies in nuclear-family data. Am J Hum Genet. 2002, 70 (2): 487-495. 10.1086/338919.
Mitchell AA, Cutler DJ, Chakravarti A: Undetected genotyping errors cause apparent overtransmission of common alleles in the transmission/disequilibrium test. Am J Hum Genet. 2003, 72 (3): 598-610. 10.1086/368203.
We thank Dr Bruno Fallisard for his scientific input in the design of the test.
BJF carried out the programming, testing and simulation of the methods, and drafted the manuscript. DEG contributed to the design of the study and critical review of the manuscript, MC conceived the study, participated in its design and helped to draft the manuscript. All authors read and approved the final manuscript.
Bing-Jian Feng and Marilys Corbex contributed equally to this work.
About this article
Cite this article
Feng, B., Goldgar, D.E. & Corbex, M. Trend-TDT – a transmission/disequilibrium based association test on functional mini/microsatellites. BMC Genet 8, 75 (2007) doi:10.1186/1471-2156-8-75
- Conditional Logistic Regression
- Short Allele
- Allele Length
- Longe Allele
- Equal Allele Frequency