Skip to main content

Table 2 Key concepts addressed by authors of accepted papers in the Population-Based Association group

From: Above and beyond state-of-the-art approaches to investigate sequence data: summary of methods and results from the population-based association group at the Genetic Analysis Workshop 19

Theme

[Contribution reference] concept

New methods for new data types

[1] Alternative allele count: Number of reads that support a given alternative allele based on individual sequence data

[1] Negative binomial regression: Type of regression model used to investigate response variables that are counts. In contrast to Poisson regression, negative binomial regression allows for overdispersion—a variance larger than the mean

[1] Hurdle and zero-inflated models: Two statistical models used to investigate count response variables with a large proportion of zeros. Hurdle models assume that a Bernoulli process determines whether counts are zero or positive. If the response is positive, its conditional distribution is governed by a truncated-at-zero count data model. Zero-inflated models assume the response variable is a mixture of a Bernoulli and a count distribution, eg, negative binomial

[1] Downsampling: Selecting a subset of the reads in a high-coverage position to improve computational efficiency

Handling rare variants

[2] Variant ascertainment bias: Variant selection criteria, such as minor allele frequency, can influence kinship and population structure estimates

[2] Kinship estimation: the estimation of relationships among samples based upon genotypes rather than known pedigrees is sensitive to the selected variants and the applied statistical methods

[2] Population structure: Admixture events leave a signature in the patterns of genetic variation within a population. This can bias genome-wide association studies, and be used as a tool to identify genetic variants influencing a trait

[3] Firth’s penalized likelihood: A logistic regression likelihood penalized by Jeffrey’s invariant prior. A first-order bias term is introduced into the score function to reduce the bias in the log odds ratio estimate that arises as a result of sparse data

[3] Small-sample-adjusted score test: A logistic regression score test in which the null distribution of the test statistic is adjusted using estimates of small sample variance and kurtosis

[3,9] Sequence kernel association test: Variant-collapsing test for a subset of variants constructed by aggregating individual variant score test statistics

[4] Quantitative trait mapping: The search for positions along the genome associated with quantitative traits

[4] Tree-based methods: Methods that account for uneven evolutionary relatedness among genetic variants

[4] Phylogenetic tree: A bifurcating tree used to represent the evolutionary relationships among variants (illustrated in Fig. 2)

[5] Within-chain permutation: Permutation of individual phenotypes is a widely used strategy to investigate the null distribution. Under the frequentist approach, statistics based on actual data are compared with the distribution of statistics from permuted data sets. In Bayesian analyses, computing time can be reduced by permuting phenotypes within the single Markov chains used to infer posterior distributions.

[6] Minor allele count (MAC): The total count of minor alleles for all individuals evaluated at a particular position. For rare variants, the MAC reflects better data sparsity than the minor allele frequency

Rare variant behavior

[7] Gene–environment interaction term model: Statistical approach that tests for gene–environment interactions by including a gene–environment interaction term to measure the change in the outcome when both the genetic marker and environmental factor are present, as compared to when one or both factors are not present

[7] Environment-stratified models: Alternative approach to identifying gene–environment interactions, by comparing genetic effect sizes between strata defined by an environmental exposure

Follow up of association signals

[8] Bayesian adjustment for confounding: A Bayesian approach for estimating the average causal effect of an exposure on an outcome in observational studies while accounting for the uncertainty in confounder selection. It uses Bayesian model averaging to average inference across many models according to posterior weight determined by a joint model of the exposure and the outcome

[9] Logistic Bayesian LASSO (least absolute shrinkage and selection operator): Method based on a retrospective likelihood that models the probability of haplotypes given disease status. The odds of disease are expressed as a logistic regression model, whose coefficients are regularized through Bayesian LASSO