Volume 14, Issue 9 p. 2464-2481
RESEARCH ARTICLE
Open Access

Individual-based models of avian migration for estimating behavioural traits and predicting ecological interactions

Benjamin A. Tonelli

Corresponding Author

Benjamin A. Tonelli

Department of Ecology and Evolution, University of California, Los Angeles, California, USA

Correspondence

Benjamin A. Tonelli

Email: [email protected]

Search for more papers by this author
Alan E. Zelin

Alan E. Zelin

Department of Ecology and Evolution, University of California, Los Angeles, California, USA

Search for more papers by this author
Donald C. Dearborn

Donald C. Dearborn

Biology Department, Bates College, Lewiston, Maine, USA

Search for more papers by this author
Morgan W. Tingley

Morgan W. Tingley

Department of Ecology and Evolution, University of California, Los Angeles, California, USA

Search for more papers by this author
First published: 28 July 2023
Handling Editor: Chris Sutherland

Abstract

  1. Rapid advances in the field of movement ecology have led to increasing insight into both the population-level abundance patterns and individual-level behaviour of migratory species. Despite this progress, research questions that require scaling individual-level understanding of the behaviour of migrating organisms to the population level remain difficult to investigate.
  2. To bridge this gap, we introduce a generalizable framework for training full-annual cycle individual-based models of migratory movements by combining information from tracking studies and species occurrence records. Focusing on migratory birds, we call this method: Models of Individual Movement of Avian Species (MIMAS). We implement MIMAS to design individual-based models of avian migration that are trained using previously published weekly occurrence maps and fit via Approximate Bayesian Computation.
  3. MIMAS models leverage individual- and population-level information to faithfully represent continental-scale migration patterns. Models can be trained successfully for species even when little existing individual-level data is available for parameterization by relying on population-level information. In contrast to existing mathematical models of migration, MIMAS explicitly represents and estimates behavioural attributes of migrants. MIMAS can additionally be used to simulate movement over consecutive migration seasons, and models can be easily updated or validated as new empirical data on migratory behaviours becomes available.
  4. MIMAS can be applied to a variety of research questions that require representing individual movement at large scales. We demonstrate three applied uses for MIMAS: estimating population-specific migratory phenology, predicting the spatial patterns and magnitude of ectoparasite dispersal by migrants, and simulating the spread of a pathogen across the annual cycle of a migrant species. Currently, MIMAS can easily be used to build models for hundreds of migratory landbird species but can also be adapted in the future to build models of other types of migratory animals.

1 INTRODUCTION

Advancements in movement ecology are rapidly uncovering the once-cryptic behaviour of migratory animals (McKinnon & Love, 2018). Two distinct approaches to studying migration have fuelled these advances. The first is through the use of ecological “big data”—specifically centralized governmental efforts or opportunistic citizen-science projects—which record the presence, non-detection, and abundance of species across the globe (Sullivan et al., 2009; USGS Bird Banding Laboratory, 2019). Generalist databases like iNaturalist, and taxon-specific ones like eBird (Sullivan et al., 2009; Van Horn et al., 2018), the USGS Bird Banding Laboratory (USGS Bird Banding Laboratory, 2019), or the Passive Acoustic Cetacean Map (Passive Acoustic Cetacean Map (v1.1.2), 2022), compile large amounts of information detailing where species are in space and time. These data sources are being used to answer a myriad of ecological questions, including those related to migratory routes and timing (Fink, Auer, Johnston, Strimas-Mackey, et al., 2020; Rousseau et al., 2020), habitat associations (Darrah et al., 2021; Sullivan et al., 2014), and phenology (Youngflesh et al., 2021).

The second distinct method to study migration is from the perspective of individual organisms—tracking animals as they move across space. Increasingly miniaturized tracking devices, like radio and GPS tags, when paired with advanced statistical techniques (Hallworth et al., 2015; Rakhimberdiev et al., 2015), have revealed precise movements for individuals representing over a thousand species (Kays et al., 2022). These methods have uncovered and quantified important phenomena, such as migratory connectivity (Hallworth et al., 2015), route plasticity (Delmore et al., 2012), and migration speed (Stutchbury et al., 2009). Although the number of tracking studies conducted is currently limited to a small percentage of migratory species worldwide—and in some cases limited to specific subspecies or sexes (McKinnon & Love, 2018)—the pace of improvement in techniques is likely to dramatically accelerate what is known about the behaviour underpinning individual animal migrations in the near future (Costa-Pereira et al., 2022; Lennox et al., 2017).

Currently, these two methods for studying migration – akin to Eulerian and Lagrangian perspectives on studying flow—are largely distinct in the information they provide and the questions they can address. Large-scale occurrence data obscure individual variation, making it difficult to scale-down knowledge to the level of migrant behaviour. By contrast, individual-based observations (e.g. GPS tracks) are generally limited by small samples, narrow spatial and temporal scope, and biased taxonomic representation, and so do not currently capture the diversity of migratory behaviours across and within species (Kays et al., 2022; McKinnon & Love, 2018), making knowledge gained from these studies difficult to scale-up. Importantly, many questions require an understanding of both individual-level behaviour and population-level patterns. How migrants disperse parasites and spread pathogens, or when individuals from an endangered population might encounter the highest mortality risk from anthropogenic sources (e.g. building or boat collisions), are examples of important questions that are difficult to investigate empirically using either source of migratory information in isolation.

In recent years, various migration models have been designed with increasingly complex and sophisticated mathematical methods in an attempt to bridge this gap. These models are designed to use population-level data to estimate population-level traits (e.g. migratory connectivity), and to predict movement between spatial cells across time (Fuentes et al., 2023; Meehan et al., 2022; Vincent et al., 2022). Although useful for answering targeted questions, these models are currently limited in the breadth of questions they can address. Critically, previous such attempts to simulate migration do not account for individual variation in behaviour, instead either circumventing the representation of individual behaviour all together (Meehan et al., 2022), or deriving estimates of individual movement from the modelled probabilistic flow of whole populations between spatial cells (Fuentes et al., 2023; Vincent et al., 2022). Without the biologically realistic representation of how individuals move across the landscape, including biological constraints (e.g. energetics) and individual variation (e.g. geographic variation in timing of migration), it is difficult to realistically represent more complex ecological processes like the spread of a pathogen or the dispersal of a parasites by migrating populations.

Individual-based models (IBMs; see Glossary, Table 1) serve as an alternate approach to modelling migration routes with a foundation in animal behaviour. IBMs are designed to replicate the system-level patterns that emerge from the behaviour of many individuals (DeAngelis & Mooij, 2005; Grimm & Railsback, 2013). In ecological contexts, IBMs simulate the movements and interactions of individual organisms and the patterns that emerge from group dynamics, making them particularly suited to investigate questions that require an understanding of both individual-level variation and population-level patterns, such as estimating patterns of the dispersal of parasites or the spread of pathogens (DeAngelis & Mooij, 2005; Rakowski et al., 2010; Tonelli & Dearborn, 2019). Specific to migration, individual-level migratory behaviours characterized in tracking studies—for instance, when individuals initiate migration and how far they travel each day (Stutchbury et al., 2009)—can be used to parameterize an IBM that can then be used to simulate thousands or millions of individual migratory tracks (Tonelli & Dearborn, 2019).

TABLE 1. Glossary of important terms used and their acronyms, when applicable.
Term Definition
Approximate Bayesian Computation (ABC) A method used to estimate parameter values by comparing model-simulated data to empirically derived summary statistics. ABC is particularly useful in evaluating complex models (Beaumont, 2010), including IBMs
Derived parameters Additional, non-initialized variables that are calculated as a product of a fitted model. Differentiated from fitted parameters, which are estimated directly by a model from observed data
eBird Status and Trends (eBird ST) Global, high-spatial resolution estimates of species relative abundance for each week of the year. eBird ST uses millions of birdwatcher checklists coupled with statistical and machine learning models to create accurate data products representing population movements across the annual cycle (Fink, Auer, Johnston, Strimas-Mackey, et al., 2020)
Individual-based Model (IBM) A class of models (alternatively agent-based models, ABMs) commonly used in ecology to recreate system-level properties through the explicit representation of individuals. The behaviour of individuals, or agents, is governed through model parameters (Grimm & Railsback, 2013), usually defined by literature-derived information or modeller expertise
Models for the Individual Movement of Avian Species (MIMAS) A framework, available in the R programming language, to design and train species-specific IBMs of avian migration using ABC
Rejection-Approximate Bayesian Computation (r-ABC) An application of ABC, where parameter sets that fail to adequately recreate summary statistics are rejected according to a modeller-defined cutoff, while the remaining parameter sets are used to estimate posteriors (Beaumont, 2010)

Although IBMs are a promising tool for studying migration, uncertainty surrounding the behaviour of individual migrants has thus far been a major barrier to building robust models. Despite rapid advancements in migration ecology, many of the behaviours needed to parametrize an IBM for migration of a given species are still relatively unknown, as the majority of species have not been the target of individual tracking studies. Relying on inaccurate but precisely defined parameters is problematic in that it presents the risk of propagating erroneous estimates, likely influencing results of interest with an appearance of certainty (Evans, 2012). In contrast, if plausible parameter ranges are overly broad because of knowledge gaps, model outputs are likely to be similarly imprecise. Incorporating uncertainty into IBMs helps strengthen inference and supports the application of modelling efforts to decision-making (Filatova et al., 2013; Fonoberova et al., 2013; Ligmann-Zielinska et al., 2014), but if the resulting uncertainty in the model output is too great, it could limit the usefulness of any results.

A promising methodological advancement that has the potential to address the need to both account for uncertainty and improve input parameter precision in IBMs is Approximate Bayesian Computation (ABC; Csilléry et al., 2010). ABC is a method that sidesteps the need for single, best-guess estimates for parameterizing IBMs, instead relying on plausible parameter ranges, called ‘priors’. ABC adds an extra training step to the traditional IBM framework by running thousands or millions of simulations, each testing a unique location in the parameter space by drawing a value from the prior range of each parameter (Lagarrigues et al., 2015; Sirén et al., 2018; van der Vaart et al., 2015). Then, in the most simple form of ABC—rejection-ABC (r-ABC)—parameter estimates from simulations that best replicate system- or population-level observed data are retained, while all others are rejected, leading to the retention of plausible parameter estimates, or “posteriors” (Beaumont, 2010). ABC has the downstream advantage of quantifying uncertainty in parameter estimates explicitly, rather than requiring the modeller to conduct post-hoc uncertainty analyses (Dominguez Almela et al., 2020). ABC can leverage both individual- and population-level information to derive probable parameter estimates to train IBMs of ecological systems while robustly incorporating and quantifying uncertainty (Baey et al., 2022; Dominguez Almela et al., 2020; van der Vaart et al., 2015).

Here, we present MIMAS—Models for the Individual Movement of Avian Species—a methodological framework that can be used to train species-specific migration IBMs using ABC. MIMAS is designed to bridge the two epistemological frameworks for studying migration – individual-level tracking and citizen-science-based occurrence data. As designed, our IBMs integrate information from the published animal-tracking literature as well as large-scale spatio-temporal modelling—here, eBird Status and Trends (eBird ST) weekly abundance estimates (Fink, Auer, Johnston, Strimas-Mackey, et al., 2020)—to recreate full-annual cycle abundance patterns through the simulation of movements of individual migrant birds. Simulated individuals move according to phenological, energetic and navigational parameters, together determining the timing, speed and routes of migration. We demonstrate how MIMAS can be used to create IBMs representing the migration of species, simulating individuals as they move seasonally between breeding and wintering sites. MIMAS faithfully recreates continental-scale abundance patterns and, through the use of ABC, identifies plausible behavioural traits of avian migrants such as departure timing and migration speed. Given that our knowledge of migratory birds is growing rapidly given advancing technology, MIMAS is designed to be augmentable and updatable, such that future knowledge can be easily integrated. Here, we show how MIMAS can be used to estimate individual-level behavioural traits across species and be used as a predictive tool to investigate questions where large-scale migration patterns need to be connected to individual-scale behaviours, including bird-mediated dispersal and the spread of pathogens. MIMAS is currently tailored to train models of migratory landbird species but could be used to build analogous models for other migratory animals, including, for example, insects, sharks and cetaceans.

2 MATERIALS AND METHODS

We first describe how ABC is used to train IBMs, before describing how we implement this framework to build species-specific IBMs of bird migration. Lastly, we detail how MIMAS-generated IBMs can be used to answer applied questions through three case-studies.

2.1 General framework

Simplistically, the process of developing IBMs using ABC involves three successive steps: designing the IBM structure, defining priors, and training using ABC (Figure 1). The first step is to design the structure of the underlying IBM, building in as much complexity as is appropriate for testing hypotheses of interest (Grimm & Railsback, 2012, 2013). The second step is to use available behavioural information, derived from the literature or via expert knowledge, to define prior probabilities on model parameters that govern the behaviour of entities in the model (Grimm & Railsback, 2013; Malishev et al., 2018). We recommend that priors be informed through a thorough search of published literature (for an example, see Supplement 3). Each prior is defined by a probability distribution (e.g. Gaussian) representing the probability of parameter values. We recommend representing priors using simple probability distributions (e.g. Gaussian, truncated Gaussian or uniform). The third step is to run an arbitrarily large number of simulations with the IBM, each with a unique parameter set drawn from the priors. The process of r-ABC then compares the outcomes of these simulations with observed empirical data and narrows parameter distributions by retaining the best fit simulations while rejecting all others (Beaumont, 2010; van der Vaart et al., 2015). Posterior estimates for each parameter can be calculated by aggregating retained parameter values. Having many retained parameter sets allows modellers to employ bootstrapping—i.e., simulation via sampling with replacement from parameter posteriors—to get probabilistic estimates of results of interest (Dominguez Almela et al., 2020).

Details are in the caption following the image
Representation of the methodological framework of Models of Individual Movement of Avian Species. We first design the structure of species-specific individual-based models (1), then parameterize each using priors derived from the literature (2). The models are then trained using Approximate Bayesian Computation by running many simulations and comparing the output to population-level patterns (3). The simulations that best replicate population-level training data are retained and are used to derive posterior estimates for model parameters.

2.2 MIMAS

We have developed MIMAS as a model-generating pipeline written in the R programming language (R Core Team, 2021) that trains IBMs of migratory landbird movements throughout the annual cycle using r-ABC. MIMAS can be used to build models of any migratory landbird species where training data (here, weekly relative abundance models from eBird ST) are available. Here, we describe the process of using MIMAS to build models of 10 North American full- and partial-migrant landbird species, chosen to comprise a phylogenetically and geographically representative group of landbirds—including two species from the families Icteridae (orioles: Icterus bullockii, Icterus spurius), Parulidae (warblers: Setophaga citrina, Setophaga townsendi), Passerellidae (sparrows: Spizella breweri, Spizella pallida), Picidae (woodpeckers: Sphyrapicus nuchalis, Sphyrapicus varius), and Turdidae (thrushes: Hylocichla mustelina, Ixoreus naevius) with Western (5), Eastern (4), and Central (1) geographic distributions within North America. We first outline the structure of the IBMs, followed by the literature process used to determine prior parameter estimates, and then describe the use of ABC to evaluate simulations using training data from eBird ST. Lastly, we provide examples of potential applications using the trained models. MIMAS is described here as it is currently implemented, but any part of the pipeline could be adapted to fit other needs. For example, IBMs could be augmented to include additional behaviours, or the ABC training process could be augmented by including additional population-level statistics for training.

2.2.1 IBM design

The IBMs used in MIMAS are structurally identical for each of the 10 species used in the analysis and are described in brief here. For a complete description of the IBMs following the ODD protocol standard (Grimm et al., 2020), see Supplement 1. The IBMs represent birds as individual agents progressing on a daily time-step through breeding, migratory, and non-breeding states (Figure 2). At the initialization of every simulation, each individual begins at a unique breeding location drawn probabilistically from the eBird ST relative abundance distribution at the temporal midpoint of that species' breeding season. Breeding locations are then matched to non-breeding locations based on a migratory connectivity algorithm (Table 2). Each individual is initialized with full energy reserves, a value that is constant across simulated birds and drawn from a normally distributed prior. The end of the breeding season—or the initiation date of post-breeding migration—is determined for each individual by drawing probabilistically from a normal distribution defined at initialization. During each simulated day of migration, birds either make a flight, where the distance and heading of that flight are drawn from normal distributions, or initiate a stopover and refuel. Each flight depletes the energy level of the individual proportional to the distance of the flight. Initiation of stopover is determined probabilistically for each migrating bird on each day by a linear equation governed by the energetic condition of the individual, with stopovers increasingly likely as birds deplete fuel stores. The length of the stopover is calculated as the time needed to recover energy stores to the max energy level of the bird, with the energy recovered per day determined by a recovery rate parameter. Migration ends when birds arrive within a certain radius of a pre-determined non-breeding location. The position of birds is then fixed throughout the wintering period. The same processes govern behaviour during the pre-breeding migration season, followed by a stationary breeding period. The parameters that describe fall and spring migratory behaviour are estimated independently, with the exception of two parameters—migratory connectivity and goal proximity—which are considered constant across migration seasons. As currently developed, individual behaviour in each migration season is governed by 14 parameters, with a total of 26 parameters used in the full IBM model (Table 2).

Details are in the caption following the image
Graphical representation of Individual-based model compartments (rounded rectangles, a), representing the annual life cycle of avian migrants and a flowchart describing the daily decision tree (grey rectangles) and behaviour (blue ovals) of individuals (b). Behaviour is dependent on whether individuals are in a stationary (green) or migratory state (orange). Each day, individuals in stationary periods either initiate migration or remain in the stationary state, while individuals in migration either exit that state after reaching their goal location or alternatively perform a migratory action (make a flight or stopover, blue ovals).
TABLE 2. Individual-based model parameters used to simulate migration in Models of Individual Movement of Avian Species. Parameters are described according to their biological meaning and mathematical properties, as well as the annual cycle states they pertain to (B = Breeding, NB = Non-breeding, Pr-M = pre-breeding migration, Po-M = post-breeding migration; Figure 2). Derived parameters—those not needed to run simulations but that aid in interpretation—are also described.
Parameter # Parameter Description Seasons Type Continuous or discrete Unit Range of values
1 Departure date, μ Average timing of the start of migration B, NB Fitted Discrete Days 0, 365
2 Departure date, σ Standard deviation in the timing of the start of migration B, NB Fitted Discrete Days 0, 365
3 Migration timing, latitude The strength of the relationship between latitude and the initiation date of migration. A value of −1 or 1 indicate a perfect correlation between latitude, where birds breeding father north leave later (1) or earlier (−1). A value of 0 represents no correlation between breeding latitude and departure date Pr-M, Po-M Fitted Continuous Unitless -1, 1
4 Flight distance, μ Mean parameter in a truncated normal distribution (>0) roughly translating to the average distance travelled on a single night Pr-M, Po-M Fitted Continuous Kilometres 0, Inf
5 Flight distance, σ Standard deviation parameter in a truncated normal distribution (>0) roughly translating to the standard deviation of distance flown in a single night Pr-M, Po-M Fitted Continuous Kilometres 0, Inf
6 Estimated flight distance, μ Flight distance average accounting for truncated distribution parameters and impact of energetic parameters on the length of flights Pr-M, Po-M Derived Continuous Kilometres 0, Inf
7 Bearing deviation, μ Average angular deviation from shortest distance route Pr-M, Po-M Fitted Continuous Degrees 0, 360
8 Bearing deviation, σ Variation around average orientation direction for each flight Pr-M, Po-M Fitted Continuous Degrees 0, Inf
9 Migratory connectivity (MC) The strength in the relationship between non-breeding site location and migration distance. Values approaching 1 indicate very strong leapfrog migration, and values approaching −1 indicate strong chain migration (Chapman et al., 2014). A value of 0 indicates no relationship between breeding and wintering locations (R = 0). For more information, See Supplement 1: Sub-models All Fitted Continuous Unitless -1, 1
10 Energy, max The maximum energy reserves during migration and the starting energy at the onset of migration. Represented on the biologically relevant scale of number of kilometres birds can fly on energy reserves Pr-M, Po-M Fitted Continuous Kilometres 0, Inf
11 Recovery rate Recovery rate of energy reserves during each night of stopover Pr-M, Po-M Fitted Continuous Proportion of energy max 0, 1
12 Transformed recovery rate Recovery rate represented on the biologically meaningful scale of kilometres/day Pr-M, Po-M Derived Continuous Kilometres/day
13 Maximum migration duration Maximum number of days before simulated birds are forced to exit migratory compartment, and initiate breeding/non-breeding Pr-M, Po-M Fitted Discrete Days 0, Inf
14 Goal location proximity The distance from an individual's assigned breeding or non-breeding location that is adequate for the cessation of migratory behaviour All Fitted Continuous Kilometres 0, Inf

2.2.2 Determining priors

Species-specific model development starts with defining prior probabilities for each parameter in the IBM. We determined priors for each of the ten target species following literature reviews using the Web of Science. We searched for papers using both the scientific and common name of the species and keywords associated with migration or tracking studies using the following search terms:

(common name OR scientific name) AND ((migr* NOT migratorius) OR flight* OR telem* OR geolocator* OR gps OR “global positioning system” OR movement* OR track* OR transmitter* OR stopover*)

Of the ten species for which we built IBMs, eight had no information in the published literature that was useable to inform priors. When little or no information existed to inform priors for a given species, we used a set of broad priors informed by studies from all species. As with any literature search, it is possible to miss unpublished data, grey literature or information that might have otherwise helped to narrow prior ranges. The consequence of not including this information is that priors may be broader than they otherwise could be. The results of each literature search are detailed in Supplement 3.

2.2.3 Model training with ABC

After designing the IBM and determining prior parameter ranges for a given species, models are trained with a two-step process using ABC. ABC allows for an evaluation of parameter likelihood by comparing the outcomes of simulations with observed empirical data by identifying parameter sets that are most likely to be responsible for producing observed system-wide patterns (Figure 3).

Details are in the caption following the image
Hypothetical demonstration of training using r-ABC for a single parameter. Each parameter in the model is defined with a prior probability range, with the maximum density potentially representing the best-estimate of the modeller (a). Simulations are run, each using a unique parameter value from the prior (vertical lines, b). Parameter estimates that lead to the best reproduction of empirical patterns (here, eBird ST) are accepted (green) while the rest are rejected (purple). The distributions of parameter values in accepted simulations are used to calculate posterior distributions (c, green).
Here, we evaluate MIMAS by comparing MIMAS-modelled relative abundance to relative abundance maps from eBird ST (Figure 4). eBird ST models use birdwatcher checklists and a suite of environmental covariates and statistical models to account for factors such as effort and site-selection bias in the underlying data (Fink, Auer, Johnston, Ruiz-Gutierrez, et al., 2020). MIMAS is trained on full-annual cycle eBird ST maps and is therefore limited to species for which this product exists. We evaluate each parameter set by first calculating the absolute difference ( ϵ c , t ) in simulated ( S c , t ) and eBird ST-predicted ( P c , t ) relative abundance (the percentage of the total population) in equal area hexagonal cells c spaced approximately 165 km apart using the dggridr package (Barnes & Sahr, 2017) for each week t , excluding areas not modelled by eBird ST (Figure 4). We used 165 km hexagonal cells with the goal of representing migratory patterns at a relevant spatial scale, while avoiding penalizing the model for not representing abundance patterns related to smaller-scale habitat preferences.
ϵ c , t = S c , t P c , t . (1.1)
We then calculated the sum of these errors for each week ( ϵ t ) , weighting each week's error by the average of weekly uncertainty in eBird ST predictions across the species range, such that the error is less when eBird uncertainty in a given week ( σ P t ) is high.
ϵ t = ϵ c , t σ P t . (1.2)
We then sum weekly adjusted error terms to get the total error across the whole simulation ( ϵ ).
Details are in the caption following the image
Example of Models of Individual Movement of Avian Species (MIMAS) error, as determined by the difference in predicted versus simulated relative abundance, for a single week during spring migration. eBird predicted relative abundance (a), MIMAS-simulated abundance (b), the difference between the two (c), and the absolute difference (d, ϵ c , t ) are shown for a single week during spring migration of a single simulation of the Wood Thrush Individual-based model. Here, this particular simulation overestimates the number of birds in the Northeast and at the periphery of the species range (Central US), while underestimating the number of birds in the Southeast (c).
We calculate the error separately for the pre-breeding spring migration season ( ϵ spring ) and post-breeding fall migration ( ϵ fall ) . To do this, we split the year into two periods, the 26 weeks prior to the midpoint of the breeding season ( μ breed , as defined by eBird ST), and the 26 weeks immediately following the midpoint of the breeding season, respectively.
ϵ spring = t = μ breed 25 μ breed ϵ t . (1.3)
ϵ fall = t = μ breed + 1 μ breed + 26 ϵ t . (1.4)
Our model encompasses both the fall and spring migratory seasons, and the error resulting during each migration season is largely dependent on season-specific parameter values. Thus, training is most efficient when parameter estimates are assessed according to the error in their associated season. To do this, we split parameters into groups that pertain to either the fall and spring migratory seasons (Table 2) and then compared simulation error for the respective migratory season (Equation 1.2). The first step of our ABC process for each species model consists of running 100,000 simulations representing a full calendar year for 1000 individuals, including breeding and wintering periods, and then retaining parameter estimates that best replicate eBird ST abundance in spring and fall migration seasons separately. We chose the number of simulations based on computational limitations (see Section 10) and determined the number of individuals in each simulation as a balance between runtime and the inherent stochasticity of the error values across simulations. The choice of the number of simulations in the training process and the cutoff used in rejection-ABC is ultimately arbitrary, with the number of simulations constrained by available computational resources, and the cutoff percentile chosen to provide enough parameter sets in future analyses using bootstrapping. Similar cutoffs are used in other ecological applications of ABC to train IBMs (Chen et al., 2017; Hauenstein et al., 2019). We retained seasonal parameter sets from simulations within the 1st percentile of seasonal error values ( ϵ spring , ϵ fall ) . We additionally retained parameter sets from any simulation where the error was within 5% of the 1st percentile error cutoff in order to account for known stochasticity across simulations with identical parameter sets and to include information from parameter sets with nearly identical error values to those within the cutoff.

Next, we ran 100,000 more simulations with 5000 individuals, now drawing parameter values from the posterior estimates derived in the first step. For parameters that impact the error in both migratory seasons (e.g. migratory connectivity), we randomly sample from the combined fall and spring sets. We include more individuals to minimize stochasticity but do so at the cost of higher computational resources. We retain the simulations in the 1st percentile of the lowest total error, ( ϵ ), as well as any simulation within 3% of the 1st percentile cutoff. We use a smaller threshold (3% vs. 5%) in the second step because of the reduced stochasticity resulting from the inclusion of more individuals. This leads to the retention of at least 1000 potential parameter sets to be used for posterior analysis and future applications.

2.3 Assessment of ABC training

To assess the ABC training process, we measured three characteristics of the posterior estimates. First, we measured the ability of MIMAS to narrow parameter estimates from the prior to the posterior. To do this, we determined the prior-posterior overlap (PPO, Gimenez et al., 2009) for each parameter. A full description of these parameters is provided in Table 2. PPO estimates are generally used for evaluation in traditional Bayesian models, and give an indication of the identifiability of parameters, with lower percentages indicating stronger identifiability (Gimenez et al., 2009). PPO can also be influenced by how informed priors are, with more informed priors likely to result in higher PPOs. A failure of MIMAS to narrow posterior parameter ranges (resulting in a high PPO) could be indicative that population-level information from eBird ST is not informative on these parameters, or alternatively that the model may be too structurally complex.

Next, we measured the ability of MIMAS to narrow posterior estimates under increasingly informed priors. To assess whether our model achieves this objective, we parametrized four different models with increasing prior information for the Wood Thrush (Hylocichla mustelinea), a model migratory species that is well-studied. We conducted a comprehensive literature review to gather information that could inform prior parameter ranges for the species, and then trained four models (V.2021, V.2011, V.2001 and V.Null), each with separate prior parameter ranges based on the information available prior to each year, such that V.2021 represents the state of knowledge up to 2021, and V.2001 represents the state of knowledge during 2001. The V.Null model is trained with a parameter set generalized to be used with any species with no prior information (see Model parameterization). After training all four models, we assessed the similarity of posterior estimates, with the expectation that under increasingly informed prior scenarios, posterior estimates converge to similar ranges.

Lastly, we quantified the pairwise correlations (Pearson's r) between posterior parameter estimates within each species model. High correlation of ABC-estimated model posterior estimates can be useful in identifying model pathologies and also can be used to identify areas in which future empirical data collection efforts could most aid model development.

2.4 Model validation

To validate our model, we compared simulated migratory routes generated from posterior parameter sets to empirically measured locations of Wood Thrush collected from geolocator data (Stanley et al., 2021). We simulated fall migrations of 10 individuals from each of the five unique breeding locations using 100 unique parameter sets for a total of 5000 individuals. We then calculated the number of empirically estimated migratory locations that fell within a 90% kernel density estimate of the simulated migratory points to estimate coverage.

2.5 Applied model design

We built three theoretical models that can be used to augment MIMAS to simulate processes of interest. The structural design of applied models is summarized in full in Supplement 2. These models are all designed to represent hypothetical systems to serve as a foundation for future applied work. Our three independent models are built with increasing complexity to (1) observe and derive more complex individual-level features, specifically place-based patterns of migratory phenology, (2) simulate the bird-facilitated dispersal of organisms from a focal area and (3) represent pathogen spread.

2.6 Implementation and runtimes

MIMAS is implemented in R (R Core Team, 2021) and is available via GitHub (see accessibility statement). Each individual simulation with 1000 individuals takes roughly 30 seconds, and 45 s with 5000 birds using a single core on a Mac with a 2.6GHz i7 Intel processor. The full training process would take roughly 87 days for each species on a single device—and nearly 3 years to run all models in our analysis. Accordingly, we trained models using high-performance computing resources to run simulations in parallel. On a high-performance cluster using 250 cores, the training process for a single species takes approximately 12 h. We recognize that many potential users may not have access to these resources and provide instructions for submitting a request to the authors for a species model to be trained in the Data Availability Statement section for interested parties.

3 RESULTS

3.1 ABC

3.1.1 Narrowing of plausible parameter ranges

MIMAS reduced parameter ranges across all species despite the lack of informative priors for most species, overall reducing parameter space as measured by PPO (Figure 5). The overlap between prior and posterior distributions was variable within and across species (PPO range = 0.14 to 0.96), demonstrating the variable efficacy of ABC in narrowing parameter estimates. Broadly, the parameters governing the distribution of migration start dates were the most identifiable (median PPO = 0.28, range = 0.14 to 0.72, Figure 5), while energetic parameters (i.e. max energy, recovery rate) were less identifiable (median = 0.81, range = 0.36 to 0.95).

Details are in the caption following the image
Cross-species reductions in parameter space (a) were measured with the prior-posterior overlap (PPO), with parameters grouped by category and numbered as they appear in Table 2. PPO values are coloured by season: pre-breeding spring migration (green circles), fall (orange diamonds) or both (blue triangles). Red bars indicate the species-wide mean PPO for each parameter. Lower values indicate the model was better at estimating posteriors, while values of 1 indicate posterior estimates were as broad as priors. Models of Individual Movement of Avian Species estimated more specific posteriors related to the timing of migration, while energetic posteriors were less specifically estimated. Model-estimated prior and posterior ranges for three representative examples, mean departure date for Varied Thrush (Ixoreus naevius) (b), flight distance for Clay-coloured Sparrow (Spizella pallida) (c), and migratory connectivity for Hooded Warbler (Setophaga citrina) (d). Red bars represent the distribution of the prior, blue represents the posterior and purple indicate the overlap.

3.1.2 Posteriors under increasing information

We determined whether our implementation of MIMAS leads to similar posterior parameter estimates regardless of the strength of the initial priors. Using four increasingly informed models of the Wood Thrush (Hylocichla mustelina), we found that the posterior ranges of more informed models were similar to the posterior ranges of less-informed models (Supplement 4), suggesting that MIMAS performs well even when prior estimates are broad. Despite the comparatively vast literature detailing the migratory behaviour of the Wood Thrush in comparison to other species (Supplement 3), many prior and posterior ranges remain broad. Posterior estimates of those parameters remained almost identical across models (Supplement 4). The stability of posteriors suggests that the training process will arrive at similar estimates even when prior information is lacking. It is possible that strongly constraining priors for particular parameters where eBird ST is uninformative (e.g. energetics) may have a meaningful impact on posterior estimates of other parameters, particularly for species that have relatively little information available for parameterization.

3.1.3 Parameter correlation

For each of the ten species models, we assessed the pairwise correlations between posterior parameter estimates. Across trained species models, posterior parameter estimates were very weakly correlated (median R2 < 0.001, range = 0–0.15), indicating the model training process did not show pathologies resulting from strong associations in posterior parameter estimates (Supplement 4).

3.2 Model validation

To validate the Wood Thrush model, we compared the positions of birds tracked using geolocators during fall migration (Stanley et al., 2021) to MIMAS-estimated migratory locations during the same timeframe. Migratory routes varied depending on breeding location in both empirical data and simulations (Figure 9). The vast majority (92%) of empirically derived migratory locations were within the 90% KDE of MIMAS-predicted migratory routes (Figure S20).

3.3 Applied models

We outline three potential use cases for MIMAS: an observational model built to expand on information output from the IBMs, and two sub-models used to augment MIMAS and simulate ecological interactions. These two sub-models are hypothetical representations of parasite dispersal and pathogen spread designed simply to demonstrate the capabilities of MIMAS rather than represent specific systems. We describe the design of each application in brief here and in detail in Supplement 2.

3.3.1 Migratory connectivity and phenology

We built an observational model to investigate model-estimated migratory phenology using a place-based approach (Figure 6). We use our model of Townsend's Warbler (Setophaga townsendi) to estimate the passage of migrants in three areas where distinguishing migrating individuals from wintering conspecifics is difficult—Mexico City, Los Angeles and Seattle. Our observational model estimates the distribution of passage dates of migrants through these areas during the spring (Figure 6b) and fall (Figure 6c) migration seasons by querying the status of all individuals within a representative hexagonal cell. Our model suggests that migration in the fall is more protracted compared to spring, and that migratory passage is less protracted in areas closer to the area from which birds are departing (Figure 6).

Details are in the caption following the image
Models of Individual Movement of Avian Species (MIMAS) can be used to observe and estimate migratory attributes of interest, including phenology. By running many simulations (n = 100) from the MIMAS output for Townsend's Warbler (Setophaga townsendi) we can estimate the date of migratory passage of birds passing through areas (hexagonal cells 1–3, a) during the spring (b) and fall (c) migration seasons. All three cells are within the wintering range of the species (blue shading), making identification of passage migrants difficult to distinguish from wintering birds. Estimates of migratory phenology suggest that the passage is earlier and less dispersed closer to the wintering grounds in spring (b) and breeding grounds in the fall (c), and generally more protracted in the fall (b, c).

3.3.2 Dispersal of ectoparasites

We simulate dispersal via migrating birds by augmenting our IBM to include new agents—ectoparasites—which occupy and attach to birds in a given geographical area. Here, we simulate the trans-continental bird-mediated dispersal of hypothetical ectoparasites from the Yucatan Peninsula, an important staging area for many migrant species (Bayly et al., 2018), with our Wood Thrush and Hooded Warbler IBMs. By using bootstrapping with the migratory IBM, we estimate the number of parasites dispersed with associated uncertainty. We then scale estimates to represent the total number dispersed by the whole population (Hooded Warbler, 5.4 million; Wood Thrush, 12 million). In this example, the spatial patterns of parasite dispersal are similar between the two species (Figure 7a,b), likely due to the species sharing similar ranges. However, the estimated magnitude of dispersal varies considerably between the two species, with the Wood Thrush predicted to disperse more parasites, potentially due to a later migration that overlaps more with ectoparasite questing phenology. Associated uncertainty around dispersal estimates is greater for the Wood Thrush (Figure 7c).

Details are in the caption following the image
Models of Individual Movement of Avian Species can be used to estimate the relative patterns and magnitude of dispersal of organisms by multiple avian species. The predicted spatial distribution of dispersal of a hypothetical parasite dispersed by Wood Thrush (a, Hylocichla mustelina) and Hooded Warbler (b, Setophaga citrina) during the spring migratory season, and the estimated magnitude dispersed by either population (c). The predicted number of parasites dispersed is calculated as a posterior distribution derived from running many simulations for each species (here, n = 50). Spatial patterns of parasite dispersal are relatively similar between species, but the mean predicted magnitude, and associated uncertainty, is larger for the Wood Thrush.

3.3.3 Pathogen spread

To represent pathogen spread in a migratory population, we augmented our Clay-coloured Sparrow (Spizella pallida) IBM with a susceptible, infectious, recovered (SIR) sub-model. The SIR model simulates the spread of a pathogen through direct contact between infected and susceptible birds co-occurring on the same day within a specified distance. For details of the SIR model structure, parameters and starting conditions, see Supplement 2. Our MIMAS-SIR model was used to simulate and measure infection rates from a hypothetical pathogen based on date and migratory status (Figure 8b). This model can also be used to identify the location of probable infection sites (Figure 8c). For this species, our hypothetical SIR model shows an increased potential for epidemics to occur during the breeding and non-breeding season, and for infection rates to decline during migratory periods, likely due to reduced density of individuals and immunity conferred during preceding epidemics (Figure 8).

Details are in the caption following the image
Models of Individual Movement of Avian Species (MIMAS) can be used to model the spread of infectious diseases in a spatiotemporally explicit framework. By using a susceptible-infectious-recovered model in combination with simulated locations of birds through two full-annual cycles, we used MIMAS to simulate the spread of a hypothetical infectious disease with direct transmission among Clay-coloured Sparrows (Spizella pallida). The locations of infection for a random sample (a, black diamonds, n = 10% of all simulated infections) indicate infections are most common where population densities are highest: in the core of the breeding (red shading) and non-breeding ranges (blue shading). The modelled disease confers immunity for a set period of time, leading to periodic outbreaks (b), with epidemics leading to steep changes in the proportion of infected individuals during the breeding and non-breeding periods (c, red line, right axis, 20-day rolling mean). Drops in the number of infectious individuals appear to coincide with periods when a high proportion of the population is migrating (c, black line).

4 DISCUSSION

We designed a model training framework, MIMAS, to use ABC to train IBMs of species-level movements and applied this model to 10 migratory landbirds. We demonstrate the usefulness of ABC as a model calibration technique for creating realistic IBMs of animal movement while accounting for uncertainty. Our IBMs are uniquely designed to simulate migrations rooted in individual behaviour and can recreate empirically observed migratory routes of individuals, making them particularly accessible to movement and migration ecologists. With the ballooning amount of citizen-science data detailing the movements of a variety of migratory organisms—including butterflies, whales and fish—IBMs trained with ABC could prove a viable methodology for building population-level models of movement for other non-avian species. We demonstrate how ABC can be used to develop IBMs of the movement of understudied species both as a hypothesis-generating and hypothesis-testing tool.

4.1 Using MIMAS to estimate individual-level behaviours

Trained species-specific models can be used to investigate and approximate a number of individual-level behaviours of migrants. In addition to the parameters directly estimated (Table 2), IBMs can be used to estimate interspecific variation in behaviours like the number and length of stopovers taken by migrant species, or intraspecific relationships like that between migratory distance and total migration time. Because our methodology uses all available information to inform priors and train models, future empirical studies can both aid in validating these estimates and be used to improve model accuracy by incorporating this new information into priors. In addition, the relative uncertainty around parameter estimates can help inform empirical efforts by informing the behaviours that are most unknown.

At present, our IBMs are not designed to include more complex factors that likely influence the behaviour of migrants, such as weather, habitat or difference in migratory strategy across subspecies. These factors may play an important role in the abundance of birds, particularly at finer scales and also comprise potentially testable hypotheses in their own right. In the future, the underlying structure of the IBMs can be augmented to include some of these process—such as differences in the number of migrants in a given night based on atmospheric conditions, or the fine-scaled tuning of migratory flights to end in favourable habitat.

4.2 MIMAS as a predictive tool

We believe that augmenting trained IBMs with additional submodels can be broadly useful in research areas in which the movement of a large number of individuals is critical to understanding the system of interest. For example, the dispersal of organisms— particularly propagules and parasites—is a well-known consequence of avian migration (Cohen et al., 2015; Kleyheeg et al., 2019). One example of the potential dispersal-related uses is predicting the dispersal and subsequent range expansion of ticks. MIMAS could be used to identify areas and periods during which these dispersal events are likely to happen and how these might change under different climate scenarios and resulting phenological shifts.

Relatedly, public health officials could use MIMAS to identify probable routes of spread of pathogens by migratory birds. These efforts could focus on predicting when and where during the annual cycle, individuals might experience cross-species transmission events, and where infected individuals are likely to arrive from and travel to. Highly pathogenic avian influenza is one compelling system that could benefit from a model representing individual flight paths of infected birds. Further development of MIMAS to train models for waterfowl and shorebird species known to carry these viruses (Rappole & Hubálek, 2006) could help predict spatial patterns of pathogen transmission in wild populations and estimate risks to domestic poultry, livestock and human populations. Critically, the IBMs can be adapted to integrate the impacts of infection on bird behaviour (e.g. slowed or halted migration). More broadly, MIMAS can be used to investigate theoretical questions core to understanding the role of migration in the transmission of pathogens (Altizer et al., 2011).

4.3 Current limitations and future directions

Although MIMAS has clear applications in a number of settings, there are a few notable limitations and areas for future development. We employed ABC to improve the precision of posterior estimates, but uncertainty remains a factor limiting the usefulness of our models. Additionally, our model training process was limited by computational resources, and improvements that make training less computationally expensive could also improve our ability to narrow posterior ranges. Despite computational limitations, ABC reliably narrowed potential ranges of many parameters (Figure 5), particularly those governing the timing of migration. This was likely possible because eBird ST data is very informative for determining when migratory movements occur. In contrast, ABC struggled to narrow estimates for energetic parameters for most species—likely because the energetic parameters acting to limit the pace of migration (i.e. how far a bird flies before it stops to refuel and how fast it refuels) are less informed from eBird ST data. This uncertainty in parameter estimates has the downstream effect of limiting inference both at the level of individual behaviour and in applied predictive settings. Further empirical research detailing individual-level behaviours, especially related to the energetic limitations on individual migrants and the inclusion of a more realistic representation of flight costs in the IBMs (e.g. wind), may lead to more precise posterior estimates. As a result, the future improvement of our species-specific models is reliant on future field-based tracking studies and related efforts. As this new knowledge is integrated, MIMAS can be used to train more precise and accurate IBMs.

MIMAS uses modelled relative abundance information from eBird ST to train IBMs, and the accuracy and scope of this training data are important to consider as errors or biases in eBird ST are expected to propagate into parameter estimates. eBird ST is limited by data availability (i.e. birdwatcher checklists) which are biased spatially (e.g. relatively fewer checklists from Central America compared to the United States). The lack of data availability, compounded with the difficulty in detecting some species, may lead to an underestimation of relative abundance in certain areas. For example, eBird ST estimates virtually no overland migration through Mexico for Wood Thrush (compared to a trans-gulf route), despite geolocator evidence to the contrary (Stanley et al., 2021; Stutchbury et al., 2009). Our methodology could retain simulations that best recreate this eBird ST pattern (in this case, incorrectly penalizing simulations with overland migration), which in turn affects relevant parameter estimates. To address this, we provide code that can evaluate where eBird ST and MIMAS consistently disagree to identify whether MIMAS may be overfitting to eBird ST. Using this tool to analyse our Wood Thrush model, we find that our model simulates migration in certain areas predicted to be low density by eBird ST (e.g. Cuba and Mexican Gulf Coast; Supplement 5). In combination with validation from geolocator tracking data (Figure 9), it appears that the models are buffered against problematic overfitting. Although eBird ST remains the most sophisticated multi-species model of full-annual cycle abundance for birds, it may be especially important to consider how biases or errors in these products may affect model outputs. Clear communication of the potential biases and errors of eBird ST is critical to understanding potential downstream consequences to derived applications and products like MIMAS.

Details are in the caption following the image
Models of Individual Movement of Avian Species replicates migratory routes of Wood Thrush estimated from geolocator tracking. Empirically derived locations of migrant birds from Stanley et al. (2021); triangles, n = 94) and coloured by breeding origin (black circles; n = 5) fall within the simulated locations of migrants (small coloured circles) programmed to originate from the same breeding locations. Simulated migratory locations for each breeding location are shown for 1000 birds derived from 100 posterior parameter sets.

Designed to be simplistic but augmentable, we see a number of future directions that could drastically expand the scope, accuracy and applicability of MIMAS across a number of fields. First, with similar training data, like temporally-specific species distribution models, MIMAS could be used to simulate the movement of other non-avian migratory species like butterflies (Kass et al., 2020). Within species, new information from genetic studies could be integrated to explicitly represent the subspecific identity of individuals and build in the importance of geographic variability of behaviour (Ruegg et al., 2014). Similarly, the model could be updated to represent demographic structure of populations more explicitly, assigning individuals to specific age- and sex-classes. Furthermore, these individuals could be tracked over multiple seasons with the addition of birth and death processes, and models could be updated to include interannual variation in phenology and species distributions. MIMAS could also be updated to simulate season-specific movement behaviours like post-breeding dispersal or winter nomadism. In all, these additions could help realistically simulate processes of interest at much longer time scales—potentially over thousands of years and individual migration seasons—and expand the usefulness of MIMAS to applications in conservation and evolution.

5 CONCLUSION

Bridging the gap between individual behaviour and population-level patterns of migration is a stubbornly complex task, but critical to a number of important questions in ecology and conservation. Training IBMs with ABC is a straightforward and tractable approach to help investigate previously inaccessible research questions at the intersection of individual behaviour and large-scale abundance patterns. Our model training framework can be used to design and train hundreds of species-specific null models of migratory movements and to update these models iteratively as more detailed individual-level behavioural information is gathered from tracking studies. Trained models can be used as null models to estimate individual-level behaviours, and with augmentation, can simulate important processes like dispersal and pathogen spread, potentially unlocking fruitful explorations of the consequences of bird migration at continental scales.

AUTHOR CONTRIBUTIONS

Benjamin A. Tonelli designed the IBM under the guidance of Donald C. Dearborn and Morgan W. Tingley. Alan E. Zelin conducted the literature review under the guidance of Benjamin A. Tonelli. Benjamin A. Tonelli and Morgan W. Tingley designed the model training pipeline and applied models. Benjamin A. Tonelli wrote the manuscript, and all authors contributed critically to the editing and gave final approval for publication.

ACKNOWLEDGEMENTS

We thank Jamie Lloyd-Smith and members of the Tingley Lab for helpful comments and suggestions on model design and the manuscript. B.A.T. was supported by the National Aeronautics and Space Administration under the FINESST fellowship grant 80NSSC22K1530 and National Science Foundation grant EF 2033263. M.T. was supported by National Science Foundation grants EF 1703048 and 2033263.

    CONFLICT OF INTEREST STATEMENT

    The authors have no conflicts to disclose.

    PEER REVIEW

    The peer review history for this article is available at https://www.webofscience.com/api/gateway/wos/peer-review/10.1111/2041-210X.14189.

    DATA AVAILABILITY STATEMENT

    The code to recreate all aspects of this work is available on GitHub via https://github.com/bentonelli/MIMAS, along with posterior estimates for species trained as a part of this project. We ask that issues and questions with the software be reported via GitHub. A form to request an IBM be trained for a specific species is also provided on GitHub. Data and code used in this study are archived on Zenodo: https://doi.org/10.5281/zenodo.8135918 (Tonelli et al., 2023).