# EFBAT: exact family-based association tests

- Kady Schneiter
^{1}Email author, - James H Degnan
^{2}Email author, - Christopher Corcoran
^{1}, - Xin Xu
^{3}and - Nan Laird
^{4}

**8**:86

https://doi.org/10.1186/1471-2156-8-86

© Schneiter et al; licensee BioMed Central Ltd. 2007

**Received: **14 September 2007

**Accepted: **20 December 2007

**Published: **20 December 2007

## Abstract

### Background

Family-based association tests are important tools for investigating genetic risk factors of complex diseases. These tests are especially valuable for being robust to population structure. We introduce a tool, EFBAT, which performs exact family-based tests of association for X-chromosome and autosomal biallelic markers.

### Results

The program EFBAT extends a network algorithm previously applied to autosomal markers to include the X-chromosome and to perform tests of association under the null hypotheses "no association, no linkage" and "no association in the presence of linkage" under additive, dominant and recessive genetic models. These tests are valid regardless of patterns of missing familial data.

### Conclusion

The general framework for performing exact family-based association tests has been usefully extended to the X-chromosome, particularly for the hypothesis of "no association in the presence of linkage" and for different genetic models.

## Background

Family-based association tests (FBATs) are widely used in studies of the genetic risk factors of complex human diseases. These tests avoid identifying spurious associations that may result from population structure. The transmission/disequilibrium test (TDT) [1] compares transmission rates of alleles from heterozygous parents to their affected offspring. Since then, many FBATs have been created for a variety of sampling schemes and family structures as well as information such as covariates [2, 3]. Rabinowitz and Laird [4] proposed an approach to FBATs that handles many of these contingencies by a conditioning approach which is implemented in the software package FBAT [5]. The procedure uses the asymptotic distribution of the statistic to derive a p-value for testing either the hypothesis that there is "no linkage and no association" or that there is "linkage but no association" between the marker and the disease allele. This test is valid for arbitrary patterns of missing data, for the additive, dominant and recessive models of inheritance, and for X-linked or autosomal markers.

Schneiter *et al.*[6] describe a family-based testing approach that, like the Rabinowitz-Laird procedure, is valid for arbitrary patterns of missing data and for additive, dominant, or recessive inheritance, but that obtains the p-value from the exact distribution of the test statistic rather than the asymptotic distribution. Exact testing ensures that p-values are valid regardless of the size of the dataset or the distribution of the test statistic.

We describe the software package EFBAT which incorporates the tests for autosomal markers, but which also extends this exact testing procedure to markers located on the X-chromosome. EFBAT implements exact FBATs for biallelic markers located on either the X or autosomal chromosomes under the additive, dominant, or recessive models of inheritance and remains valid for arbitrary patterns of missing data. An exact test of "no linkage and no association" for X-chromosome markers has been implemented by [3] for the additive model only.

## Implementation

## Results and Discussion

Exact p-values are the ideal in hypothesis testing since they are obtained from the true distribution of the test statistic without relying on large sample approximations. This is especially important when data are sparse or datasets are small since in such cases assumptions underlying asymptotic methods may not be valid. A criticism of exact procedures is that they are computationally intensive and can therefore be very time consuming. Schneiter et al. [6] describe a modified network algorithm for implementing a family-based association test. Network algorithms are computational tools that implicitly identify an exact distribution and thereby greatly reduce the amount of computation needed to perform an exact test. Other network algorithms are described in [7] and [8].

The procedures in EFBAT are valid regardless of missing data patterns. The software can handle families with multiple siblings as well as 0, 1, or 2 missing parents. Complex pedigrees can be processed as well; however, these are parsed into nuclear families which are then treated independently.

*AB*daughters are assigned the

*AA*genotype with probability 1/2, and vice versa; and

*A*sons are assigned the

*B*genotype with probability 1/2, and vice versa:

Conditional Distributions of Offspring Genotypes for X-linked markers. Table 1 displays the conditional distributions of the offspring genotypes when testing for linkage for markers on the X-chromosome. A dot, ·, indicates a missing parent, and "c. p." denotes "conditional probability".

Parents' genotypes | Children's genotypes | Conditional distribution |
---|---|---|

( | Any | Observed data have c. p. 1. |

( | { | Observed data have c. p. 1 |

{ | Daughters have c. p. 1; sons assigned | |

( | { | Randomly assign |

(·, | { | Observed data have c. p. 1. |

(·, | { | Observed data have c. p. 1. |

(·, | { | Randomly assign |

(·, ·) | { | Observed data have c. p. 1. |

(·, ·) | { | Observed data have c. p. 1. |

(·, ·) | { | Randomly assign |

(·, ·) | { | Daughters have c. p. 1; randomly assign |

1. Genotypes switch if both parents are known, (*AB*, *A*), or the father is known and the mother can be inferred as *AB*.

2. If the mother is known to be *AB*, sons switch; daughters also switch if there are two genotypically distinct daughters.

3. If neither parent is known, daughters switch if there are two genotypically distinct daughters; sons switch if there are two genotypically distinct sons.

The statistic used to implement the exact test for both X-linked and autosomal markers is derived from the conditional distribution of offspring genotypes. It is given by *S* = ∑*XT*, where *X* is a function of an individual's genotype and *T* is a function of the individual's trait. The product of *X* and *T* is summed over all offspring in all families. For the exact test, we assume *T* is 1 for affecteds and otherwise 0 since allowing *T* to be continuous is straightforward in theory but computationally difficult.

By default, EFBAT assumes additive inheritance, i.e. for each child, *S* is a count of the allele of interest for that individual. Analyses can also be performed assuming dominant or recessive models, with sons treated as in the additive case and daughters coded as in autosomal markers. EFBAT assumes sons are coded as homozygous for each marker, although only the first allele is used.

Assuming additive inheritance, *S* is a count the allele of interest among all affected offspring in all families. Under dominant inheritance, *X* is a count of all genotypes that include at least one copy of the allele of interest among affected children. Assuming recessive inheritance, *S* is a count of the genotypes homozygous for the allele of interest among all affected children. Parental data are pertinent solely to the identification of the distributions of offspring genotypes and do not contribute to the value of the test statistic.

The exact distribution of *S* is obtained by identifying the probability of each possible value of *S*. A p-value is calculated by summing the probabilities of *S* more extreme than the observed value. To identify all possible values of *S* explicitly is very time consuming for any but very small datasets as the number of possible values increases multiplicatively across families. The modified network algorithm implicitly identifies these values, resulting in rapid production of exact p-values. For a dataset of 300 families, EFBAT computes the exact p-value in less than one second.

## Conclusion

The EFBAT software implements exact FBATs of the hypotheses "no linkage and no association" and "linkage but no association" for biallelic markers from either autosomal or X chromosomes. These procedures are valid under the additive, dominant, and recessive models of inheritance and for data consisting of families with or without available parental genotypes.

## Availability and requirements

• Project name: EFBAT

• Project homepage: http://www.math.usu.edu/~schneit/efbat

• Programming language: C++

• License: Freely available

• Any restrictions on use by non-academics: No

## Declarations

### Acknowledgements

We thank M. Degiorgio for computer advice and two anonymous reviewers for comments. KS, JHD, XX, and NML were supported by NIH grant MH59532.

## Authors’ Affiliations

## References

- Spielman R, McGinnis RE, Ewens WJ: Transmission Test for Linkage Disequilibrium: The Insulin Gene Region and Insulin-dependent Diabetes Mellitus. Am J Hum Genet. 1993, 52: 506-516.PubMed CentralPubMedGoogle Scholar
- Boehnke M, Langefeld CD: Genetic association mapping based on discordant sib pairs: The discordant alleles test. Am J Hum Genet. 1998, 62: 950-961. 10.1086/301787.PubMed CentralView ArticlePubMedGoogle Scholar
- Knapp M: The transmission/disequilibrium test and parental genotype reconstruction: the reconstruction-combined transmission/disequilibrium test. Am J Hum Genet. 1999, 64: 861-870. 10.1086/302285.PubMed CentralView ArticlePubMedGoogle Scholar
- Rabinowitz D, Laird NM: A unified approach to adjusting association tests for population admixture with arbitrary pedigree structure and arbitrary patterns of missing marker information. Hum Hered. 2000, 50: 211-223. 10.1159/000022918.View ArticlePubMedGoogle Scholar
- Laird N, Horvath S, Xu X: Implementing a unified approach to family based tests of association. Genet Epidemiol. 2000, 19: 36-42. 10.1002/1098-2272(2000)19:1+<::AID-GEPI6>3.0.CO;2-M.View ArticleGoogle Scholar
- Schneiter K, Laird NM, Corcoran C: Exact family-based association tests for biallelic data. Genet Epidemiol. 2005, 29: 185-194. 10.1002/gepi.20088.View ArticlePubMedGoogle Scholar
- Mehta C, Patel N: A network algorithm for performing Fisher's Exact Test in
*r*×*c*contingency tables. J Am Stat Assoc. 1983, 78: 427-434. 10.2307/2288652.Google Scholar - Corcoran C, Mehta C, Senchaudhuri P: Power comparisons for tests of trend in dose-response studies. Statistics in Medicine. 2000, 19: 3037-3050. 10.1002/1097-0258(20001130)19:22<3037::AID-SIM601>3.0.CO;2-7.View ArticlePubMedGoogle Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.