In many biological applications, parameter inference for models of interest from data is computationally challenging. Ideally, one would like to infer parameters using either maximum likelihood or Bayesian approaches which explicitly calculate the likelihood of the data given the parameters. While such likelihoods can be calculated for data from non-recombining regions [1, 2] and for data where all sites are independent [3, 4], full-likelihood methods are not currently feasible for many models of interest (complex demography with recombination, for example). Therefore, approximations are desirable.

In the last several years, approximate methods based on summary statistics have gained in popularity. These methods come in several flavors:

1. Simulate a grid over the parameter space in order to calculate the likelihood of the observed summaries, given parameters [5, 6]. The maximum-likelihood estimate is the point on the grid that maximizes the likelihood of the observed summary statistics.

2. The maximum-likelihood algorithm can be modified to perform Bayesian inference by simulating parameters from prior distributions, calculating summary statistics, and accepting the parameters if they are "close enough" to the observed [7, 8]. The method runs until the desired number of acceptances are obtained, and can be extremely time-consuming. I refer to this approach as rejection sampling, and it has been applied in several contexts [9–11].

3. Decide ahead of time how many random draws to take from a prior distribution, then accept the fraction of draws which generate summary statistics closest to the data, according to some distance metric. This is the rejection-sampling approach of [12], and differs from the approach of [7–11] in that a*finite number of simulations are performed from the prior* instead of repeatedly simulating from the prior until a desired number of acceptances are recorded.

4. Take the parameters accepted from Method 3, and regress those acceptances onto the distance between the simulated and observed summary statistics [12].

The latter three methods are all forms of "Approximate Bayesian Computation" (ABC), a term which generally applies to inference problems using summary statistics instead of explicit calculations of likelihoods. The three Bayesian schemes described above are the simplest form of ABC, and the approach has been extended to use Markov Chain Monte Carlo techniques to explore the parameter space [13] and sequential Monte Carlo [14]. Further developments include formalizing methods for choosing summary statistics [15] and methods for model selection [16]. In this paper, I will use "regression ABC" to refer to Method 4, the regression approach of [12]. The main appeal of regression ABC is speed, overcoming a major limitation of rejection-sampling, which is often too slow to feasibly evaluate the performance of the estimator (due to requiring high rejection rates in order to obtain reasonable estimates [8, 11]). In general, the regression ABC method has several appealing features, including simplicity of implementation, speed, and flexibility. The flexibility is a key issue, as it allows one to rapidly explore how many, and which, summary statistics to use, which is an important issue, as subtle choices can lead to surprising biases in estimation [17].

Currently, many tools are available for the rapid development and testing of summary-statistic based approaches to inference, including rapid coalescent simulations for both neutral models [18] and simple models of selection [9, 19, 20], software to calculate summary statistics from simulation output [21], and open-source statistical packages such asR[22]. Currently, the only software package available to implement the regression algorithm of [12] is implemented in theR language, and is available from http://www.rubic.rdg.ac.uk/~mab/. The purpose of this paper is to describe a software package which automates the linear regression portion of regression ABC analyses in a fast and flexible way, with user-friendly features simplifying automation. The results from the current code have been validated against independentR implementations, and the "ABCreg" package is fully documented for use by non-programmers.