deBInfer: Bayesian inference for dynamical models of biological systems in R
Summary
- Understanding the mechanisms underlying biological systems, and ultimately, predicting their behaviours in a changing environment, requires overcoming the gap between mathematical models and experimental or observational data. Differential equations (DEs) are commonly used to model the temporal evolution of biological systems, but statistical methods for comparing DE models to data and for parameter inference are relatively poorly developed. This is especially problematic in the context of biological systems where observations are often noisy and only a small number of time points may be available.
- The Bayesian approach offers a coherent framework for parameter inference that can account for multiple sources of uncertainty, while making use of prior information. It offers a rigorous methodology for parameter inference, as well as modelling the link between unobservable model states and parameters, and observable quantities.
- We present deBInfer, a package for the statistical computing environment R, implementing a Bayesian framework for parameter inference in DEs. deBInfer provides templates for the DE model, the observation model and data likelihood, and the model parameters and their prior distributions. A Markov chain Monte Carlo (MCMC) procedure processes these inputs to estimate the posterior distributions of the parameters and any derived quantities, including the model trajectories. Further functionality is provided to facilitate MCMC diagnostics, the visualization of the posterior distributions of model parameters and trajectories, and the use of compiled DE models for improved computational performance.
- The templating approach makes deBInfer applicable to a wide range of DE models. We demonstrate its application to ordinary and delay DE models for population ecology.
Introduction
The use of differential equations (DEs) to model dynamical systems has a long and fruitful tradition in biological disciplines such as epidemiology, population ecology and physiology (Volterra 1926; Kermack & McKendrick 1927). As DE models are used in an attempt to understand biological systems, it is becoming clear that the simplest models cannot capture the rich variety of dynamics observed in them (Evans et al. 2013). However, more complex models come at the expense of additional states and/or parameters and require more information for parameterization. Further, as most observational data sets contain uncertainty, model identification and fitting become increasingly difficult (Lonergan 2014). Keeping complex models tractable and testable, and linking modelled quantities to data, thus requires statistical methods of similar sophistication. This is particularly relevant in biology, where data series are often short or noisy, and where the scope for observational or experimental replication may be limited.
A vast array of analytical and numerical methods exists for solving DE models as well as exploring their properties and the effect of parameter values on their dynamics (Jones 2003; Smith 2011). In some cases, parameters may be derived from first principles or measured directly, but often some or all parameters cannot be determined by either approach, and it is necessary to estimate them from an observational data set.
Parameter estimation methods for DE models, and their implementation as computational tools, are still less well developed than the aforementioned system dynamics tools and are a topic of active research.
giving rise to a true data set
such that
(eqn 1)
arise from a sum of
and measurement noise that is independently and normally distributed then leads to the least squares solution that is found by minimizing the Euclidian norm of the residual,
(eqn 2)This approach has been applied to both ordinary differential equations (ODEs) (e.g. Baker et al. 2005) and simple delay‐differential equations (DDEs) (e.g. Horbelt, Timmer & Voss 2002). It allows for point estimates of the parameters, as well as the estimation of normal confidence intervals for the parameters and the correlations between them. However, these error bounds are local in nature and thus offer limited insight into the variability that is to be expected in the model outputs.
Bayesian approaches for parameter estimation in complex, nonlinear models were established early on (e.g. Tarantola & Valette 1982; Poole & Raftery 2000), and they are being applied with increasing frequency to a broad range of biological models (e.g. Coelho, Codeço & Gomes 2011; Voyles et al. 2012; Johnson, Pecquerie & Nisbet 2013; Smith et al. 2015). Recent methodological advances have included the application of Hamiltonian Monte Carlo to ODE models, realized in the software package Stan (Carpenter et al. 2016), particle MCMC methods (Andrieu, Doucet & Holenstein 2010), approximate Bayesian computation (ABC; e.g. Liu & West 2001; Toni et al. 2009) and so‐called plug‐and‐play approaches (e.g. He, Ionides & King 2009). A suite of these methods are implemented in the R package pomp (King et al. 2016). While many statistical approaches, including the one presented here, treat the numerical solution of the DE model as exact, there has also been work towards quantifying the uncertainty contained in the numerical DE solutions themselves (Chkrebtii et al. 2015).
In the Bayesian approach, the model, its parameters and the data are viewed as random variables. This approach to parameter inference is attractive, as it provides a coherent framework that allows the incorporation of uncertainty in the observations and the process, and it relaxes the assumption of normal errors. It provides us not only with full probability distributions describing the parameters, but also with probability distributions for any quantity derived from them, including the model trajectories. Further, the Bayesian framework naturally lets us incorporate prior information about the parameter values. This is particularly useful when there are known biological or theoretical constraints on parameters. For example, many biological parameters, such as body size, cannot take on negative values. Using informative priors can help constrain the parameter space of the estimation procedure, aiding with parameter identifiability.
We explain the rationale behind the Bayesian approach below and describe our implementation of a fitting routine based on a Markov chain Monte Carlo (MCMC) sampler coupled to a numerical DE solver. We illustrate the application of deBInfer to a simple example, the logistic differential equation, and a more complex model of the reproductive life history of the fungal pathogen Batrachochytrium dendrobatidis.
Materials and methods
, given an empirical data set
, and accounting for the uncertainty in the data. The model takes the general form
(eqn 3)
and generates the vector
as output; and θ denotes a set of parameters. Further, we define
. When all τ ∈ τ = 0, the model is represented by a system of ODEs; when any τ < 0, the model is represented by a system of delay‐differential equations (DDEs). For the purposes of inference, τ is simply a subset of the parameters θ that are to be estimated. deBInfer implements inference for ODEs as well as DDEs with constant delays.
(eqn 4)
denotes the data and θ denotes the set of model parameters. The product in the numerator is the joint distribution, which is made up of the likelihood
or
, which gives the probability of observing
given the deterministic model
, and the prior distribution Pr(θ), which represents the knowledge about θ before the data were collected. The denominator represents the marginal distribution of the data
. Before the data are collected,
is a random variable, but after they are collected, the marginal distribution becomes a fixed quantity. This means, the inferential problem reduces to
(eqn 5)
to be a proper probability density (or mass) function that integrates to 1.
Closed form solutions for the posterior are practically impossible to obtain for complex nonlinear models with more than a few parameters, but they can be approximated, for example, by combining the MCMC algorithm with a Metropolis–Hastings sampler (Clark 2007). This yields a sequence of likelihoods that follow a frequency distribution which approximates the posterior distribution.
describes the probability of the data for a given realization of the model
, and we can use the fact that the data are uncertain to derive an expression like
(eqn 6)
is a parametric probability distribution, typically with first and second moments μ and
,
is data item t and
is the variance associated with
.
contain multiple data series, for example time‐course observations of different state variables, following different probability distributions. In this case, the likelihood becomes the product over all series and each data item in each series s
(eqn 7)Implementation
deBInfer provides a framework for dynamical models consisting of a deterministic DE model and a stochastic observation model. To perform inference using deBInfer, the user must specify R functions or data structures representing the DE model, an observation model and thus the data likelihood and declare all model and observation parameters, including prior distributions for those parameters that are to be estimated. The DE model itself can also be provided as a shared object, for example a compiled C function. deBInfer takes these inputs and performs MCMC to sample from the posterior distributions of parameters, solving the DE model numerically within the MCMC procedure. The MCMC procedure for deBInfer offers independent as well as random‐walk Metropolis–Hastings updates and is implemented fully in R (R Core Team 2015). Background on Metropolis–Hastings MCMC is widely available in the literature (e.g. Clark 2007; Brooks et al. 2011).
As numerically solving the DE model is the most computationally costly step, we made two slight modifications to the basic Metropolis–Hastings algorithms. (i) deBInfer makes a distinction between the parameters of the DE model
, and the observation parameters
, invoking the solver only for updates of the former, and (ii) the prior probability of each parameter proposal from the random‐walk sampler is evaluated before the posterior density and the acceptance ratio are calculated. This allows the rejection of proposals outside the prior support without invoking the numerical solver. The algorithm is outlined in Table 1.
in the Markov chain at step k to its value at step k+1 proceeds via the outlined steps. q is a conditional density, the so‐called proposal distribution
|
deBInfer provides a choice of three proposal distributions q for the first step in the algorithm, a normal
, an asymmetric uniform
and a multivariate normal
. deBInfer requires manual tuning; that is, the variance components
, a and b, and Σ, respectively, are user‐specified inputs. The asymmetric uniform distribution is useful for proposals of parameters that are strictly positive, such as variances, and the multivariate normal is useful for efficiently sampling parameters that are strongly correlated, as is often the case for DE model parameters.
A simple example – logistic population growth
(eqn 8)| Function | Description |
|---|---|
| debinfer_par | Creates a data structure representing an individual parameter or initial value of the DE model, or an observation parameter, and the corresponding values, priors, etc. |
| setup_debinfer | Combines multiple parameter declarations into an input object for inference |
| de_mcmc | Conducts MCMC inference on a DE model and returns an object of the class debinfer_result |
| plot.debinfer_result | Plots traces and posterior densities (wrapper for coda::plot.mcmc) |
| summary.debinfer_result | Summary statistics for MCMC samples (wrapper for coda::summary.mcmc) |
| pairs.debinfer_result | Pairwise plots and correlations of marginal posterior distributions |
| post_prior_densplot | Overlay of posterior and prior densities for free parameters |
| post_sim | Simulate posterior trajectories of the DE model and summary statistics thereof |
| plot.post_sim_list | Plot posterior DE model trajectories |
Installation
The deBInfer package is available on CRAN. The development version can be installed from github using devtools (Wickham & Chang 2016), which can be installed from CRAN
-
#Install the CRAN release.
-
install.packages("deBInfer")
-
#Alternatively install devtools and the development
-
version of deBInfer.
-
install.packages("devtools")
-
devtools::install_github("pboesu/debinfer")
-
#Load deBInfer.
-
library(deBInfer)
Specification of the differential equation model
deBInfer makes use of the deSolve and PBSddesolve packages (Soetaert, Petzoldt & Setzer 2010; Couture‐Beil et al. 2014) to numerically solve ODE and DDE models. The DE model has to be specified as a function containing the model equations, following the guidelines given in the respective package documentations. For our simple example, the function takes three inputs: time, a vector of time points at which to evaluate the DE; y, a vector containing the initial value for the state variable N; and parms, a vector containing the parameters r and K.
-
logistic_model <-function(time, y, parms) {
-
with(as.list(c(y, parms)), {
-
dN <-r *N*(1-N/K)
-
list(dN)
-
})
-
}
Observation model and likelihood specification
. A set of simulated observations is provided with the package and can be loaded with the command data(logistic). The appropriate log‐likelihood takes the form
(eqn 9)
are the observations, and
are the predictions of the DE model given the current MCMC sample of the parameters θ. Further, ɛ ≪ 1 is a small correction needed, because the exact DE solution can equal zero (or less, depending on numerical precision of the solver). ɛ should therefore be at least as large as the expected numerical precision of the solver. We chose
, which is on the same order as the default numerical precision of the default solver (deSolve::ode with method = “lsoda"), but we found that the inference results were insensitive to this choice as long as ɛ ≤ 0·01 (Appendix S1, Conclusion).
The deBInfer observation model template requires three inputs: a data.frame of observations, data; the simulated trajectory returned by the numerical solver in MCMC procedure, sim.data; and the current sample of the parameters, samp. The user specifies the observation model such that it returns the summed log‐likelihoods of the data. In this example, the observations are in the data.frame column N_noisy, and the corresponding predicted states are in the column N of the matrixlike object sim.data (see Appendix S1).
-
#load example data
-
data(logistic)
-
# user defined data likelihood
-
logistic_obs_model <- function(data, sim.data,
-
samp){
-
epsilon <-1e-6
-
llik <- sum(dlnorm(data $ N_noisy, meanlog = log
-
(sim.data[, "N"]+ epsilon),
-
sdlog = samp[["sdlog.N"]], log = TRUE))
-
return(llik)
-
}
Parameter, prior and sampler specification
All parameters that are used in the DE model and the observation model need to be declared for the inference procedure using the debinfer_par() function. The declaration describes the variable name, whether it is a DE or observation parameter and whether or not it is to be estimated. If the parameter is to be estimated, the user also needs to specify a prior distribution and a number of additional parameters for the MCMC procedure. deBInfer currently supports priors from all probability distributions implemented in base R, as well as their truncated variants, as implemented in the truncdist package (Novomestky & Nadarajah 2012).
We declare the DE model parameter r, assign a prior
and a random‐walk sampler with a Normal kernel (samp.type=“rw") and proposal variance of 0·005 with the command
-
r <-debinfer_par(name = "r", var.type = "de", fixed =
-
FALSE, value = 0.5, prior = "norm", hypers = list(mean
-
= 0, sd = 1), prop.var = 0.005, samp.type = "rw")
Similarly, we declare
and
.
-
K <-debinfer_par(name = "K", var.type = "de", fixed =
-
FALSE, value = 5, prior = "lnorm", hypers = list
-
(meanlog = 1, sdlog = 1), prop.var = 0.1, samp.type =
-
"rw")
-
sdlog.N <-debinfer_par(name = "sdlog.N", var.type =
-
"obs", fixed = FALSE, value = 0.1, prior = "lnorm",
-
hypers = list(meanlog = 0, sdlog = 1), prop.var= c
-
(3,4), samp.type = "rw-unif")
Note that we are using the asymmetric uniform proposal distribution for the variance parameter (samp.type="rwunif"), as this ensures strictly positive proposals. Lastly, we provide an initial value
= 0·1 for the DE:
-
N <-debinfer_par(name = "N", var.type = "init", fixed =
-
TRUE, value = 0.1)
MCMC inference
The MCMC procedure is called using the function de_mcmc() which takes the declared parameters, the DE and observational models, the data and further optional arguments to the MCMC procedure and/or the solver as inputs and returns an array containing the resulting MCMC samples.
All declared parameters are collated using setup_debinfer()
-
mcmc.pars <-setup_debinfer(r, K, sdlog.N, N)
and passed to de_mcmc() which is set to use deSolve::ode() as a back end in this case, as specified by the argument solver="ode"
-
# do inference with deBInfer
-
# MCMC iterations
-
iter <-5000
-
# inference call
-
mcmc_samples <-de_mcmc(N = iter, data = logistic,
-
de.model = logistic_model,
-
obs.model = logistic_obs_model, all.params =
-
mcmc.pars,
-
Tmax = max(logistic$time), data.times = logistic
-
$time,
-
cnt = 500, plot = FALSE, solver = "ode")
Inference outputs
The inference function returns an object of class debinfer_result, which contains the posterior samples in a format compatible with the coda package (Plummer et al. 2006), as well as the DE and observation models and all parameters used for inference. This allows the use of the diagnostic functions and plotting routines provided in coda (see Fig. 1). We also provide additional functions and methods such as pairs.debinfer_result()to create pairwise plots of the marginal posterior distributions, which show correlations between individual parameters (see Fig. 2), post_prior_densplot(), which allows a visual comparison between prior and marginal posterior densities for each parameter, and post_sim(), which simulates posterior model trajectories and associated credible intervals, as well as plotting methods for the latter (see Fig. 3).



Example application – DDE model of fungal population growth
To illustrate applications of deBInfer beyond the simplistic example above, we outline inference procedures for a more complex model and corresponding observational data. Full model details and annotated code can be found in Appendix S2. Our example demonstrates parameter inference for a DDE model of population growth in the environmentally sensitive fungal pathogen Batrachochytrium dendrobatidis (Bd), which causes the amphibian disease chytridiomycosis (Rosenblum et al. 2010; Voyles et al. 2012). This model has been used to further our understanding of pathogen responses to changing environmental conditions. Further details about the model development, and the experimental procedures yielding the data used for parameter inference, can be found in Voyles et al. 2012.
, or die at rate
.
is the fraction of sporangia that survive to the zoospore‐producing stage. We assume that it takes a minimum of
days before the sporangia produce zoospores, after which they produce zoospores at rate η. Zoospore‐producing sporangia die at rate
. The concentration of zoospores, Z, is the only state variable measured in the experiments, and it is assumed that these zoospores settle (
) or die (
) at the same rates as the initial cohort of zoospores. The equations that describe the population dynamics are as follows:
(eqn 10)
(eqn 11)
(eqn 12)
are independent Poisson random variables with a mean given by the solution of the DDE, at times
. The log‐likelihood of the data given the parameters, underlying model and initial conditions is then a sum over the n observations at each time point in 
(eqn 13)

used for fitting.
Known limitations
The MCMC sampler is implemented in R, which makes it considerably slower than samplers written in compiled languages, for example those underlying packages such as Stan (Carpenter et al. 2016) or Filzbach (Purves & Lyutsarev 2016). For inference conducted purely in R, the computational bottleneck is solving the DE model numerically. However, even for relatively simple models, a 5‐ to 10‐fold speedup of the inference procedure can be achieved using compiled DE models (see Appendix S3). Furthermore, the debinfer MCMC algorithm is not adaptive and requires manual tuning. Lastly, sampling using the Metropolis–Hastings MCMC algorithm itself can be inefficient in the presence of strong parameter correlations. Alternative approaches such as Hamiltonian MC (Carpenter et al. 2016) or particle‐filtering methods (e.g. King et al. 2016) may offer more efficient means for parameter estimation in ODEs in these cases. Nonetheless, the package is able to fit real‐world problems in a matter of minutes to hours on current desktop hardware, which is acceptable for many applications, while providing flexible inference for both ODE and DDE models.
Conclusion
Understanding the mechanisms underlying biological systems, and ultimately, predicting their behaviours in a changing environment, requires overcoming the gap between mathematical models and experimental or observational data. We believe that Bayesian inference provides a powerful tool for fitting dynamical models and selecting between competing models. The deBInferR package provides a suite of tools to this end in a programming language that is widespread in many biological disciplines. We hope that our package will lower the hurdle to the uptake of this inference approach for empirical biologists. We encourage users to report bugs and provide other feedback on the project issue page: https://github.com/pboesu/debinfer/issues
Authors' contributions
L.R.J. conceived the methodology and wrote the initial R implementation; P.H.B.S. re‐implemented the methodology as an R package; P.H.B.S. and S.J.R. wrote the package documentation; and P.H.B.S. led the writing of the manuscript. All authors tested the software, contributed critically to the drafts and gave final approval for publication.
Acknowledgements
The authors thank Richard FitzJohn and two anonymous reviewers for their constructive comments on earlier versions of the code and manuscript. All authors were supported by the US National Science Foundation (Grant PLR‐1341649). The authors also thank Jamie Voyles for sharing the chytrid growth data. The authors have no conflict of interest to declare.
Data accessibility
All code and data used in this article are included in the deBInfer package and its vignettes, which are freely available from CRAN: https://cran.r-project.org/package=deBInfer. The development version of the package is available at https://github.com/pboesu/debinfer.
References
Citing Literature
Number of times cited according to CrossRef: 7
- Paul J. Hurtado, Building New Models: Rethinking and Revising ODE Model Assumptions, An Introduction to Undergraduate Research in Computational and Mathematical Biology, 10.1007/978-3-030-33645-5_1, (1-86), (2020).
- Lauren M. Childs, Olivia F. Prosper, The impact of within-vector parasite development on the extrinsic incubation period, Royal Society Open Science, 10.1098/rsos.192173, 7, 10, (192173), (2020).
- Graziella V. DiRenzo, Christian Che‐Castaldo, Sarah P. Saunders, Evan H. Campbell Grant, Elise F. Zipkin, Disease‐structured N‐mixture models: A practical guide to model disease dynamics using count data, Ecology and Evolution, 10.1002/ece3.4849, 9, 2, (899-909), (2019).
- Benjamin Rosenbaum, Michael Raatz, Guntram Weithoff, Gregor F. Fussmann, Ursula Gaedke, Estimating Parameters From Multiple Time Series of Population Dynamics Using Bayesian Inference, Frontiers in Ecology and Evolution, 10.3389/fevo.2018.00234, 6, (2019).
- Karen C. Abbott, Fang Ji, Christopher R. Stieha, Christopher M. Moore, Fast and slow advances toward a deeper integration of theory and empiricism, Theoretical Ecology, 10.1007/s12080-019-00441-x, (2019).
- Philipp H. Boersch-Supan, Leah R. Johnson, Two case studies detailing Bayesian parameter inference for dynamic energy budget models, Journal of Sea Research, 10.1016/j.seares.2018.07.014, (2018).
- Benjamin Rosenbaum, Björn C. Rall, Fitting functional responses: Direct parameter estimation by simulating differential equations, Methods in Ecology and Evolution, 10.1111/2041-210X.13039, 9, 10, (2076-2090), (2018).








: solve the DE model


