Volume 12, Issue 2 p. 311-321
RESEARCH ARTICLE
Free Access

Incorporating time into the traditional correlational distributional modelling framework: A proof-of-concept using the Wood Thrush Hylocichla mustelina

Kate Ingenloff

Corresponding Author

Kate Ingenloff

University of Kansas Biodiversity Institute, Lawrence, KS, USA

Correspondence

Kate Ingenloff

Email: [email protected]

Search for more papers by this author
Andrew T. Peterson

Andrew T. Peterson

University of Kansas Biodiversity Institute, Lawrence, KS, USA

Search for more papers by this author
First published: 04 November 2020
Citations: 12

Abstract

  1. Detailed spatio-temporal information about geographic distributions of species is critical for biodiversity analyses in conservation and planning. Traditional correlative modelling approaches use species observational data in model calibration and testing in a time-averaged framework. This method averages environmental values through time to yield a single environmental value for each location. Although valuable for exploring distributions of species at a broad level, this averaging is one of myriad factors impacting model quality and reliability.
  2. We sought to optimize traditional correlative niche model performance in distributional ecology contexts by incorporating time specificity into the existing modelling framework. We modified the existing framework to account for temporal dynamics in species' distributions to produce more robust, temporally explicit models. Using the Wood Thrush Hylostichla mustelina as our study species, we introduce a method of (a) deriving a temporally explicit pseudo-absence dataset using kernel density estimates to replicate relative sampling of sites through time, and (b) incorporating temporally explicit covariates in model calibration.
  3. By accounting for location, and month and year of primary data collection, the time-specific models successfully yielded dynamic predictions reflecting known distributional shifts in Hylocichla mustelina's annual movement pattern.
  4. The modified data preparation steps that we present incorporate temporal dimensions into traditional correlational modelling approaches improving predictive capacity and overall utility of these models for highly mobile, short-lived or behaviourally complex species. With the ability to estimate species' niches in greater detail, time-specific models will be able to address specific concerns of species-level management and policy development for highly mobile and/or migratory species, as well as disease vectors of public health interest.

1 INTRODUCTION

Understanding species' geographic distributions is critical for managing biodiversity. Correlative distributional modelling (a.k.a., species distributional modelling, ecological niche modelling) is a popular tool used for characterizing species' ecological niches in environmental space and projecting them into geographic dimensions (Peterson, 2006). By relating primary biodiversity data to biologically relevant environmental covariates to provide spatially explicit predictions of climatic suitability, such models can inform about where survey data are limited or where knowledge gaps may impede development of more detailed models. These simple but powerful tools render geographic dimensions of biodiversity more understandable and accessible to diverse stakeholders and have been liberally incorporated into a broad range of research questions relevant to biodiversity and conservation (Eaton et al., 2018; Franklin, 2013; Rodríguez et al., 2007), invasive species (Ingenloff et al., 2017), climate change (Beck, 2013; Pacifici et al., 2015; Searcy & Shaffer, 2016), phylogeography (Alvarado-Serrano & Knowles, 2014), and human health (Peterson, 2006, 2014; Rodríguez et al., 2007).

A notable limitation of current niche modelling methodologies is the temporal averaging of covariate data such that each unique geographic coordinate pair is assigned a single environmental value for the full study period. This averaging reduces predictive capacity for species that are highly mobile, behaviourally complex, or with a lifetime or life stages that are short with respect to the temporal span of environmental changes; although, using higher resolution weather data in place of long-term climate data has been shown to mitigate some of these impacts (Bateman et al., 2012; Feldmeier et al., 2018). Traditional approaches use covariates that are averaged temporally, effectively treating covariates as static values for the breadth of the study period. The result is a single, static view of predicted suitability for the study species, which has been the topic of discussion in light of species that switch among multiple niches between seasons (Martínez-Meyer et al., 2004). These approaches can result in over-generalization of estimates of ecological niches (Barve et al., 2014; Ingenloff, 2017; Peterson et al., 2005), particularly for migratory or behaviourally complex organisms (Ingenloff, 2017; Peterson et al., 2005). Modelling mobile species presents a particularly challenging situation because, to be meaningful, predictive models must capture both a seasonally dynamic landscape and associated species movements, which traditional methods are unable to account for (Elith et al., 2010). In light of anthropogenic climate change and other human impacts, garnering an understanding of species' distributional dynamics through time, rather than a simple snapshot of overall potential geographic distribution, is critical.

Unlike the field of movement ecology where the pairing of covariate data contemporal to species observational data has been the standard for some time (Dodge et al., 2013), few correlative modelling studies in distributional ecology have incorporated time specificity. Most studies applied a ‘seasonal’ modelling approach—modelling of a single facet of a species' life history (e.g. breeding or wintering) using time-averaged approaches (Laube et al., 2015; Skov et al., 2016; Soriano-Redondo et al., 2019). Fink et al. (2010) introduced spatiotemporal exploratory models (STEM) wherein an ensemble or mixture model is created from a suite of seasonally or behaviourally restricted distributional models to encompass the breadth of the study species' life history. Seasonal approaches, however, may be subject to reduced predictive capacity resulting from the need for user-designated subsetting of observational data and because it still involves considerable temporal averaging of environmental variation. Williams et al. (2017) explored a ‘full year’ modelling framework that evaluated each month averaged across years to characterize accurately seasonal movements of cuckoos. Other researchers overcame issues of over-generalization owing to time averaging with more unique approaches: for example, Barve et al. (2014) combined detailed physiological measurements with temporally specific summaries of weather and climate to understand geographic distributions of Spanish moss. However, incorporation of mechanistic approaches within correlative modelling frameworks is constrained by an overwhelming lack of detailed physiological data for the vast majority of species (Peterson et al., 2015). More recently, two studies incorporated time without excessive temporal-averaging or incorporation of mechanistic methods: Welch et al. (2018) produced monthly distributional models for seven shark species over 10 years, yielding a dynamic view of monthly projected distributions for the study period; and, Abrahms et al. (2019) used a multi-model ensemble approach to predict daily habitat suitability for blue whales. Still, explicit methodologies broadly accessible to the greater community of distributional ecology practitioners remain lacking.

Here we introduce several modifications to the input data preparation process for traditional niche modelling frameworks that incorporate temporal dimensions and produce dynamic niche predictions. We use a well-sampled migratory species, the Wood Thrush Hylocichla mustelina (Gmelin, 1789), to demonstrate three modifications to the data preparation process: generation of a weighted time-specific pseudo-absence dataset, wherein covariate data are assigned to each occurrence corresponding to the place and time of collection, and spatiotemporal rarefication of presence and pseudo-absence data (Figure 1). These modifications account for spatial and temporal survey bias in openly accessible primary occurrence data and alleviate the problem of over-generalization in niche characterization resulting from temporal averaging of covariates. We provide a comparison of this time-explicit method with the traditional time-averaged approach and assess the ability of each to predict climatic suitability for the species across North and Central America.

Details are in the caption following the image
Modified input data preparation workflow for temporally explicit correlative modelling. Blue dashed circles denote changes from traditional methods

2 MATERIALS AND METHODS

To maximize reproducibility, we obtained all data from open-access sources and ran all processes using the open-source statistical analysis program R v3.5.2 (R Development Core Team, 2009). All Supporting Information (https://doi.org/10.6084/m9.figshare.8160290.v2) and relevant R scripts (https://doi.org/10.6084/m9.figshare.8160227.v1) are freely available. Figure 1 illustrates the modified data preparation workflow described here relative to traditional time-averaged approaches (Figure S1).

2.1 Study species

We selected the long-distance migrant Wood thrush Hylocichla mustelina because distributional knowledge is effectively complete and data are abundant. Each year, H. mustelina travels between discrete breeding and wintering ranges. Breeding occurs during late spring and summer (mid-May into August) in the eastern United States and southeastern Canada in deciduous and mixed forests (BirdLife International, 2017; Collar, 2005). Early autumn, they begin a staggered migration from breeding to wintering grounds in southern Mexico and Central America, with more northerly populations migrating beginning in mid-August and more southerly populations delaying migration until late September and early October (BirdLife International, 2017; Collar, 2005). Hylocichla mustelina remains on the wintering grounds until late March–April. Vagrants have been recorded in the Caribbean, northern South America, western United States and western Europe (Collar, 2005).

2.2 Input data

2.2.1 Occurrence data

We downloaded two sets of primary occurrence data from the Global Biodiversity Information Facility (GBIF). The first was that of our study species, Hylocichla mustelina (GBIF 26 March, 2018). The second, the reference group used to characterize the sampling process that produced the data (Anderson, 2003), included the entire family Turdidae (GBIF 2426 March, 2018). We constrained both searches to records obtained via human observation between 1980 and 2018 with no known spatial issues for all of continental North and Central America. Initial data calls returned 532,633 H. mustelina records from 19 institutions and 4,848,853 Turdidae records from 47 institutions.

Data were subjected to a sequence of quality control checks including visual inspections to detect obvious outliers/inaccuracies (e.g. wrong hemisphere, long-distance vagrants), and removal of records with imprecise (e.g. no decimal places) or missing geographic coordinates. Records collected after 2015 were removed owing to temporal limits of covariate data (see below). We delineated a model calibration region (accessible area for the species) based on the known natural history of H. mustelina, in which we identified the range of the core population including breeding and wintering locations, and a ~750 km buffer to account for their high mobility, but excluding known areas of vagrancy (Figure S2; Barve et al., 2011). This reduced data available to 433,648 H. mustelina and 3,011,848 Turdidae records. We intended to run analyses using all data (1980–2015); however, generating the time-specific pseudo-absence dataset (see below) was so cumbersome computationally that we stopped analyses after March 2010. This reduced our data to 134,293 H. mustelina and 828,267 Turdidae records. We set aside 2014–2015 H. mustelina data (149,340 records) for use as an additional model evaluation dataset.

2.2.2 Pseudo-absence data

Derivation of a pseudo-absence dataset from a reference group sampled in the same manner as the study species (Anderson, 2003) is a common practice for correlative models requiring presence-absence data when true absence data are unavailable (Kramer-Schadt et al., 2013). Sampling bias, however, is a universal characteristic of primary biodiversity data (Kadmon et al., 2004), and can be a significant problem in correlative modelling (Anderson et al., 2016; Phillips et al., 2009). While pseudo-absence data cannot correct for biases inherent in a presence dataset, they can ensure that background data used in model calibration reflect sampling biases in presence data.

To incorporate time, we generated a ‘bias cloud’: a time-specific pseudo-absence dataset reflective of sampling intensity through time for the duration of the study period (see dynamic pseudo-absence dataset in the Supporting Information). To this end, we first divided the study period into discrete time steps; we chose an intermediate temporal resolution (monthly), but we note that this could be applied to any temporal resolution for which the occurrence and covariate data both are available. For each time step, we subset reference group occurrence data (including the study species) sampled during that time step and generated a kernel density estimate (KDE) using the ‘npudensbw’ and ‘npudens’ functions in the np package (Hayfield & Racine, 2008). Kernel density bandwidth specifications were calculated using an Epanechonikov kernel, a least-squares cross-validation bandwidth selection method, and an adaptive nearest neighbour continuous variable as a balance between producing detailed KDEs and computational feasibility. We then applied a 95% threshold (excluding the lowest 5%) to the resulting KDE and took a weighted sample wherein the kernel density estimate of each pixel functioned as the weight. The number of points extracted was proportional to the number of reference group observations in that time step relative to the overall dataset (see Supporting Information for additional detail). The collective sampling from all time steps yielded a pseudo-absence dataset reflective of spatial and temporal sampling patterns within the reference group data.

Despite our intention of producing a pseudo-absence dataset through 2015, the process was halted at March 2010 as sampling through time increased drastically resulting in dramatically increased processing time for heavily sampled time steps (reaching nearly a week on a powerful lab desktop; Figures S3 and S4). The pseudo-absence dataset for the amended study period (January 1980 – March 2010) totalled 241,958 pseudo-absences for the 363 time steps, approximately double the number of presence points. Although no H. mustelina observation data existed for 13 months during the study period (Table S1), the process produced pseudo-absence data for all time steps because reference group observations existed in all time steps—these mismatches between occurrence data and pseudo-absence data function in effect as absence information in model calibration.

2.2.3 Time-averaged and time-specific datasets

To address sampling bias further, we rarefied H. mustelina data and pseudo-absences to a single point per pixel relative to the spatial resolution of the covariate data (Kramer-Schadt et al., 2013; Phillips et al., 2009). To create the time-averaged datasets, we spatially rarified the original data to a single point per pixel. For time-specific datasets, we rarified the original data spatially and temporally to one point per pixel per time step. The rarefication process reduced data to 34,004 (1980–2010) and 36,436 (2014–2015) H. mustelina records 205,837 pseudo-absences for time-averaged analyses, and 76,119 (1980–2010) and 61,479 (2014–2015) H. mustelina records and 241,958 pseudo-absences for time-specific analyses. We ensured that temporal information (e.g. month and year) remained associated with all data for use in model evaluation.

2.2.4 Covariate data

For simplicity, we used three monthly covariates from TerraClimate—precipitation, and minimum and maximum temperature—as these data cover a broad temporal range (1958–2015) at monthly resolution (Abatzoglou et al., 2018). Data were cropped to the study region (Figure S2) and left at their native 4.6 km resolution. Covariate data were available through 2015 only, limiting the overall study period to 1980–2015. We extracted covariate data to all rarefied time-specific occurrence and pseudo-absence records described above such that each point was associated with the climatic information specific to the place and point in time (month) of observation.

For time-averaged analyses, we derived a dataset of six summary layers that included mean and range for each of the three covariates for January 1980–March 2010. Summary data were extracted to each occurrence and pseudo-absence record in the rarefied time-averaged datasets. We created a second set of summary covariates for 2014–2015 and extracted these data to the rarefied 2014–2015 time-averaged H. mustelina data.

After removing two records with no covariate values because they fell marginally on the climate data grid, we randomly divided each of the 1980–2010 datasets 50–50 for use in model calibration/selection and evaluation (Table S2). The 2014–2015 H. mustelina data were set aside for final model evaluation.

2.3 Correlational niche modelling

Following Qiao et al. (2015), we explored a suite of model calibration scenarios for three commonly used presence-absence algorithms—generalized linear models (GLM), generalized additive models (GAM), and boosted regression trees (BRT)—to identify the parameterization yielding the best time-averaged and time-specific model for each algorithm. For each algorithm, we explored a suite of parameter settings, such that we generated large numbers of candidate models, and then selected a final model among them using criteria of predictive ability and simplicity.

2.3.1 Model calibration

We calibrated GLMs with both main effects and pairwise interactions via an exhaustive search using the ‘glmulti’ function (Calcagno, 2013). We used the ‘gam’ function (Wood, 2011) to calibrate GAMs, assessing four smoothers (cubic splines, thin plate splines, P splines, and adaptive splines), two smoother basis dimensions (default, k = 25), two smoothing parameters (default, restricted maximum likelihood), and covariate interactions ranging from no interaction to full interactions. We visually assessed covariate responses for GLM and GAM calibrations using the ‘response.plot2’ function (Thuiller et al., 2016). Finally, we calibrated BRTs using the ‘gbm.step’ function (Hijmans et al., 2017), evaluating four levels of learning rate (default, 0.005, 0.0025, 0.001), two bag fraction levels (default, 0.6), and tree complexity from 0 up to three (time-specific) and four (time-averaged).

2.3.2 Model selection

We used algorithm-appropriate metrics to select the best time-averaged and time-specific calibration (parameter settings) for each algorithm. We used the Akaike Information Criterion (AIC) for within-algorithm model selection of GLMs and GAMs (Warren & Seifert, 2011). However, as AIC is inappropriate for tree-based algorithms, we used training and test data mean squares estimates (MSE) and test data omission rate for BRTs. MSE values were calculated using the ‘MSE’ function (Signorell et al., 2019). Potential discrepancies involved in comparing AIC to cross validation results were not a concern because we were not using these statistics for cross-algorithm comparisons.

2.3.3 Model transfer

The six models selected (three time-averaged and three time-specific) were transferred across the study region for both study periods (1980–2010 and 2014–2015). Time-specific models were projected to each time step for both evaluation periods. Time-averaged models were projected to both sets of time-averaged covariate data. We thresholded model outputs to the minimum presence training value adjusted to permit 1% omission error (E = 1%) to allow for some error in the data (Pearson et al., 2007). To allow comparisons with time-specific outputs, we plotted time-averaged test data for each time step onto static model outputs. The resulting monthly projections were then aggregated into image sequences in graphics interchange format (GIF) to produce dynamic visualizations of predicted climatic suitability through time using r packages magick and gifski (Ooms, 2018a, 2018b).

2.3.4 Model evaluation

We evaluated thresholded model projections using the temporally corresponding evaluation datasets. Specifically, we looked at model omission rates (how well test data were predicted by the model) and proportion of the study region predicted suitable. Because H. mustelina exhibits a predictable movement pattern between distinct breeding and wintering sites during the year, we also sought to assess model performance within these broader periods. Thus, for assessment purposes only, we designated three ‘seasons’ based on behaviour: breeding (June–August), wintering (October–April) and migratory (May and September).

3 RESULTS

The model selection process yielded six models for evaluation: three time-averaged and three time-specific (details in Tables S3 and S4). Figure 2 provides a snapshot of time-specific model results for all three algorithms; time-averaged model results are presented in Figure 3. GIFs providing a side-by-side comparison of the thresholded time-averaged and time-specific model projections for each algorithm for the 1980–2010 primary study period and 2014–2015 supplemental evaluation period are available in the Supporting Information (https://doi.org/10.6084/m9.figshare.8160290.v2).

Details are in the caption following the image
Snapshot of thresholded (E = 1%) time-specific boosted regression trees (BRT; left column), generalized additive models (GAM; centre column), and generalized linear models (GLM; right column) projections for three individual times steps from 1980 (January, top row; April, centre row; August, bottom row). Green regions indicate areas predicted climatically suitable; tan denotes areas predicted unsuitable; black triangles denote Hylocichla mustelina test data
Details are in the caption following the image
Thresholded (E = 1%) time-averaged 1980–2010 boosted regression trees (BRT; left), generalized additive models (GAM; centre), and generalized linear models (GLM; right) model projections. Green regions indicate areas predicted climatically suitable; tan denotes areas predicted unsuitable; black triangles denote Hylocichla mustelina test data

3.1 Time-specific models

All three time-specific models adequately predicted both the area of the core Hylocichla mustelina population and beyond to include areas of known vagrancy. On average, they predicted greater proportions of the study area climatically suitable for both evaluation periods than time-averaged models (Table S5). GAM and GLM models had the lowest overall mean omission rates during the 1980–2010 study period (GAM 0.036; GLM 0.036; BRT 0.210; Table S6); however, overall omission rate for 2014–2015 was roughly equivalent for all three algorithms, with all three models performing well during model transfer (BRT 0.026, GAM 0.027, GLM 0.025).

Variability in model performance (omission rate) was greatest during the wintering months for GAM and GLM (1980–2010), and for all three 2014–2015 projections (Figures S5–S7; Table S7). Mean monthly area predicted suitable was most restricted during the wintering period for the GAM (12.4%–41.8%) and GLM (14.6%–46.4%) models, and greatest during the breeding period (GAM 80.1%–81.5%; GLM 89.7%–91.6%) for the primary study period (Figures S8–S10; Table S8). This same trend was evident in all three 2014–2015 time-specific model projections. The 1980–2010 BRT model had noticeably elevated omission rates during April (0.591) and May (0.401), coinciding with a tendency towards underpredicting the northernmost distributional extent of H. mustelina. While all three time-specific models failed to predict the full northern extent of the species, the area predicted suitable by the BRT model was particularly low (April—19.7%; May—30.0%) relative to the GAM (April—30.5%; May—56.7%) and GLM (April—36.3%; May—60.7%) models.

3.2 Time-averaged models

All three time-averaged models fit the core distribution of Hylocichla mustelina, but failed to predict into areas of known vagrancy, and tended to underpredict during wintering months, with patchy areas of predicted suitability in Central America and southeastern Mexico (Figure 3). Model projections into 2014–2015 predicted more area climatically suitable (BRT 36.9%, GAM 42.3%, GLM 46.3%) than for 1980–2010 (BRT 32.8%, GAM 36.6%, GLM 39.3%; Figures S8–S10; Table S5). Overall omission rates were effectively the same for GAM (1980–2010:0.029; 2014–2015:0.020) and GLM (1980–2010:0.030; 2014–2015:0.020) models, and slightly elevated in the BRT model (1980–2010:0.037; 2014–2015:0.039; Figures S5–S7; Table S6). Model variability was greatest during the winter for both time periods, with area predicted suitable patchier in Central America and southeastern Mexico. Omission rates ranged 0.092–0.159 (BRT), 0.074–0.178 (GAM) and 0.087–0.283 (GLM) during 1980–2010 and 0.087–0.177 (BRT), 0.049–0.158 (GAM) and 0.045–0.157 (GLM) during 2014–2015. Model variability was lowest for all three models during the breeding season (0.025–0.326 for BRT, 0.018–0.044 for GAM and 0.008–0.026 for GLM) for 1980–2010, as well as for GAM (0.007–0.023) and GLM (0.008–0.023) in 2014–2015.

4 DISCUSSION

All three time-specific models successfully yielded dynamic (monthly) predictions reflecting known distributional shifts in Hylocichla mustelina's annual cycle (BirdLife International, 2017; Collar, 2005). On average, the majority of the study region (>75%) was predicted climatically suitable during the breeding season (June–August) and included areas of known vagrancy in the central United States; a moderate proportion of the study region (60%–75%) was predicted suitable during migration (May, September); and areas of bioclimatic suitability during the wintering period (November–April) were restricted to the southeastern United States, eastern Mexico, and Central America. In contrast, the time-averaged models predicted 32%–39% (1980–2010) and 37%–46% (2014–2015) of the study region (including the eastern United States, southeastern Canada and Central America) climatically suitable, successfully capturing the geographic breadth of H. mustelina's core population; but the static view of predicted climatic suitability failed to reflect the dynamic nature of H. mustelina's annual distribution. Both time-averaged and time-specific models exhibited increased model variability (elevated omission rates) during the wintering period potentially as a reflection of strong temporal bias in the primary observation data (Figure S4), and omission rates were notably elevated in April and May (1980–2010) for time-specific BRT and GLM models.

Temporally explicit approaches to correlative niche modelling methods have been at the core of movement ecology analyses for some time (Gschweng et al., 2012), and yet the distributional ecology community has yet to adopt a similar approach. Indeed, despite long-standing understanding that traditional (time-averaged) correlative modelling approaches lead to over-generalization of climatic niches (Barve et al., 2014; Ingenloff, 2017; Peterson et al., 2005), efforts to incorporate time-specificity into the modelling framework have a fairly punctuated history. Seasonal modelling has been the gold star method for some time (Laube et al., 2015; Skov et al., 2016; Soriano-Redondo et al. 2019). However, because these methods subset data, resulting models often provide a less than comprehensive overview of the study species' full niche. Methods introduced to account for these gaps include the STEM approach (stacking of seasonally time-averaged models; Fink et al., 2010) and full modelling framework (assessing time-averaged monthly intervals; Williams et al., 2017). Despite the well-documented need for a temporally inclusive approach to modelling that avoids time-averaging across study periods (Peterson et al., 2005), techniques did not appear in the literature until 2018 (Abrahms et al., 2019; Welch et al., 2018). Our contribution establishes (and makes more broadly accessible) a set of temporally explicit input data preparation techniques that improves the overall utility of the traditional correlative niche modelling framework for non-sedentary species. We note that this approach can be applied at many temporal resolutions, depending on the questions being asked and the data availability: centuries for species responding to broad, historical climate shifts (e.g. in the Pleistocene); years for long-lived species that respond to environmental changes very generally (e.g. El Niño events); months for behaviourally complex species or seasonal migrants; or even days for short-lived species (e.g. mosquitoes).

The workflow presented here builds upon the traditional modelling framework to improve ability to characterize species' niches for any situation in which a species' distribution may respond to changing environmental conditions, and considers the full range of a species' distributional dynamics relative to climatic suitability in a single modelling endeavour. It also incorporates the full suite of available observational data without subjective subsetting of data or running multiple series of model calibrations to capture bioclimatic suitability of a species for individual time steps (Peterson et al., 2005). Furthermore, where the traditional framework requires averaging of environmental covariates across large timeframes (long-term climate means), rendering some climatic covariates useless (e.g. variables with large variances), time-specific modelling readily allows for the incorporation of higher resolution weather and remotely sensed covariates which have already shown improved performance in time-averaged modelling applications (Bateman et al., 2012; Feldmeier et al., 2018). Finally, derivation of a time-specific pseudo-absence dataset, or ‘bias cloud’, provides a dataset reflective of both spatial and temporal facets of relative sampling effort. These improvements can provide a significant advantage over previous methods, such as for application to conservation or management of highly mobile species, assessments of species with short life spans and spatiotemporally variable populations (e.g. mosquito populations), and species responding to large-scale climate variation.

As with any modelling effort, several limitations are associated with time-explicit modelling. First, the derivation of temporally explicit pseudo-absence datasets can be computationally demanding. This limitation ultimately resulted in our abridging the study period from 2015 to 2010, although relatively few species will have such enormous quantities of distributional data available. Furthermore, this approach still results in some degree of temporal averaging due to the limitations of available primary species observation data and relevant covariate data. This issue can be alleviated with improved data precision and quality perhaps from high resolution weather and remotely sensed data. Finally, these methods are not necessary or appropriate for all species or modelling applications. Rather, we recommend application (a) for species where traditional methods either fail to capture underlying distributional dynamics or (b) when the questions underpinning the modelling require more insight into a species' distribution through time. Indeed, both algorithm selection and parameterization, and the decision to engage in time-averaged, seasonal, or time-specific modelling should be approached as an iterative, hypothesis- or question-driven process. In spite of these limitations, this approach provides a critical template for capturing the distributional dynamics of highly mobile or behaviourally complex species.

Future research should assess the utility of these methods for niche-tracking species that move in geographic space in concert with changing bioclimatic conditions (Tingley et al., 2009) versus niche-shifting species (Nakazawa et al., 2004), particularly as regards the need for separate models in cases in which qualitatively distinct niches are used in different parts of the year (Batalden et al., 2007). These methods could be adapted into a hypothesis testing framework easily, similar to tests developed by Warren et al. (2008). Further application has potential to elucidate drivers of movement patterns and improve our understanding of migratory connectivity, a critical component of effective conservation plans for mobile species (Runge et al., 2014).

4.1 Sample size

Correlational distribution modelling methods assume systematic sampling of the full model calibration region, but this rarely happens in practice. In time-averaged approaches, small sample sizes are associated with increased model variability and decreased model accuracy (Stockwell & Peterson, 2002; Wisz et al., 2008). We purposely chose a study species for which available data were plenty and completely representative of the species' realized niche, but availability of such robust datasets is limited to relatively few taxonomic groups. Indeed, correlative modelling is often used to help fill gaps where distributional knowledge is limited and sampling incomplete, and small sample sizes often correspond to species of increased conservation concern (Gaubert et al., 2006). Furthermore, data available for mobile species are often biased towards particular seasons or behaviours, as seen in our H. mustelina example where the overwhelming majority of available data were collected in the United States during breeding or migration (Figure S4). By associating sample data with spatially and temporally relevant covariate data, a time-specific approach could maximize limited data through increased retention during the data cleaning process (e.g. data with duplicate locality but different time of collection) to decrease overall model variability. These issues of information content and retention warrant further exploration.

4.2 Temporal sampling bias

Spatial and temporal biases are inherent in open-access primary occurrence data. Myriad studies illustrate strong links between spatial bias in primary species observation data and environmental bias in resulting distributional models (Beck et al., 2014; Phillips et al., 2009), but few assess impacts of temporal bias. Environmental bias may be introduced into models as a result of uneven sampling in geographic space, and it stands to reason that we also risk inserting bias into models where strong biases exist in the temporal capture of data. Further research should treat the relationship between temporal bias and model accuracy. See Supporting Information for further discussion.

Our ability to gain improved insight into the spatiotemporal dynamics of species distributions via temporally explicit approaches can positively impact analyses in biodiversity management and conservation, as well as in public health. Consideration for the complexity involved with conserving migratory species is a relatively recent addition to conservation planning, and can be critical in ensuring that species which engage in long-distance movement patterns are protected adequately (Fink et al., 2010; Jetz et al., 2019; Runge et al., 2014, 2016). In particular, such approaches may be useful in identifying marine areas of conservation interest (Nur et al., 2011; Skov et al., 2016) or in other dynamic management applications such as establishing or evaluating marine time-area closures (Abrahms et al., 2019; Lascelles et al., 2014). Similarly, the ability to produce time-specific distributional models can also help inform decision-making and control measures for current and emerging zoonotic and vector-borne diseases when populations of species respond to environmental changes (Clements & Pfeiffer, 2009; Giles et al., 2014; Parra-Henao et al., 2016; Ramsey et al., 2015).

ACKNOWLEDGEMENTS

We are grateful for all the open access data providers sharing their data. We would also like to thank the KU ENM Group for insightful discussions, and Christopher Hensz and Marlon Cobos for help with programming hiccups. We thank two reviewers for thoughtful critiques of the original manuscript.

    AUTHORS' CONTRIBUTIONS

    K.I. and A.T.P. conceived the ideas and designed methodology; K.I. led analyses and writing of the manuscript. Both authors contributed critically to the drafts and gave final approval for publication.

    DATA AVAILABILITY STATEMENT

    The data used as the basis of analysis in this study are available from the Global Biodiversity Information Facility at the following DOIs: Hylocichla mustelina point occurrence data—https://doi.org/10.15468/dl.hdg0e2; Turdidae point occurrence data—https://doi.org/10.15468/dl.usjwfr, https://doi.org/10.15468/dl.mzlu54, https://doi.org/10.15468/dl.7tyi73, https://doi.org/10.15468/dl.wbehpg, https://doi.org/10.15468/dl.ewipqa, https://doi.org/10.15468/dl.tdz842, https://doi.org/10.15468/dl.3klver, and https://doi.org/10.15468/dl.jwmwmw. Data supporting the findings of this study are available on FigShare in the Supporting Information (https://doi.org/10.6084/m9.figshare.8160290.v2) alongside relevant R scripts (https://doi.org/10.6084/m9.figshare.8160227.v1).