Volume 13, Issue 8 p. 1790-1804
RESEARCH ARTICLE
Free Access

Ignoring species availability biases occupancy estimates in single-scale occupancy models

Graziella V. DiRenzo

Corresponding Author

Graziella V. DiRenzo

U.S. Geological Survey, Massachusetts Cooperative Fish and Wildlife Research Unit, University of Massachusetts, Amherst, MA, USA

Correspondence

Graziella V. DiRenzo

Email: [email protected]

Search for more papers by this author
David A. W. Miller

David A. W. Miller

Department of Ecosystem Science and Management, Pennsylvania State University, University Park, PA, USA

Search for more papers by this author
Evan H. C. Grant

Evan H. C. Grant

U.S. Geological Survey, Eastern Ecological Research Center, Laurel, MD, USA

Search for more papers by this author
First published: 04 May 2022
Citations: 1

Handling Editor: Chris Sutherland

Abstract

  1. Most applications of single-scale occupancy models do not differentiate between availability and detectability, even though species availability is rarely equal to one. Species availability can be estimated using multi-scale occupancy models; however, for the practical application of multi-scale occupancy models, it can be unclear what a robust sampling design looks like and what the statistical properties of the multi-scale and single-scale occupancy models are when availability is less than one.
  2. Using simulations, we explore the following common questions asked by ecologists during the design phase of a field study: (Q1) what is a robust sampling design for the multi-scale occupancy model when there are a priori expectations of parameter estimates? (Q2) what is a robust sampling design when we have no expectations of parameter estimates? and (Q3) can a single-scale occupancy model with a random effects term adequately absorb the extra heterogeneity produced when availability is less than one and provide reliable estimates of occupancy probability?
  3. Our results show that there is a tradeoff between the number of sites and surveys needed to achieve a specified level of acceptable error for occupancy estimates using the multi-scale occupancy model. We also document that when species availability is low (<0.40 on the probability scale), then single-scale occupancy models underestimate occupancy by as much as 0.40 on the probability scale, produce overly precise estimates, and provide poor parameter coverage. This pattern was observed when a random effects term was and was not included in the single-scale occupancy model, suggesting that adding a random-effects term does not adequately absorb the extra heterogeneity produced by the availability process. In contrast, when species availability was high (>0.60), single-scale occupancy models performed similarly to the multi-scale occupancy model.
  4. Users can further explore our results and sampling designs across a number of different scenarios using the RShiny app https://gdirenzo.shinyapps.io/multi-scale-occ/. Our results suggest that unaccounted for availability can lead to underestimating species distributions when using single-scale occupancy models, which can have large implications on inference and prediction, especially for those working in the fields of invasion ecology, disease emergence, and species conservation.

1 INTRODUCTION

Single-scale occupancy models allow biologists to disentangle the ecological and sampling processes that generate observed data (Kéry & Royle, 2016, 2021; MacKenzie et al., 2002, 2003; Tyre et al., 2003). A key assumption of these models is that populations are closed (i.e. no births, deaths, emigration or immigration) among replicate surveys at a site within a season. This closure assumption reflects the idea that if a site is occupied during at least one survey, then it is assumed to be occupied during all surveys, and any non-detections can be interpreted as ‘false negatives.’ However, when the closure assumption is violated, the occupancy probability parameter, which describes the probability a species occupies a site, needs to be re-interpreted (Grant, 2015; Kendall et al., 2013). One of the most common ways that occupancy probability is re-interpreted is as ‘habitat use’ in the sense that the species occurs in the area for some portion of time (and is unavailable or fails to use the habitat for the remainder of the time). Because biologists are typically more interested in estimating the occupancy probability rather than the probability of habitat use, they may decide to use a multi-scale occupancy model instead of the single-scale occupancy model (Aing et al., 2011; Mordecai et al., 2011; Nichols et al., 2008). Although a great deal of effort has been dedicated to understanding how violations of the closure assumption affects the estimation of the occupancy parameter in single-scale occupancy models (e.g. Aing et al., 2011; Chandler et al., 2015; Kendall et al., 2013; Mordecai et al., 2011; Rota et al., 2009; Valente et al., 2017), it is not as clear what a robust sampling design accounting for species availability looks like and what the statistical properties of the multi-scale and single-scale occupancy models are when availability is less than one.

Species availability will tend to be less than one under at least three scenarios, which can act singly or in combination (Figure 1a). First, species may move between available and unavailable states within their territory, which is also known as temporary emigration (Chandler et al., 2011; Efford & Dawson, 2012; Kendall & Nichols, 1995; Nichols et al., 2009). Temporary emigration is especially relevant when surveying episodic or mobile species that may enter or leave sites over the course of sampling, violating the geographic closure assumption (Figure 1a; Hayes & Monfils, 2015). Examples of temporary emigration include: aquatic fauna being submerged and not surface-active during an aerial survey (Marsh & Sinclair, 1989), a mouse entering torpor on cold nights (Kendall et al., 1997), salamanders retreating to underground burrows to prevent desiccation when surface conditions are too dry (Connette et al., 2015; O'Donnell & Semlitsch, 2015), and plant dormancy or other plant-specific phenology traits (such as budding, flowering, etc.) that makes them unavailable for sampling some portion of the time (Bornand et al., 2014; Kéry & Gregg, 2004). Second, availability can be less than one when an individual's home range (for mobile species) or species occurrence (for sessile species) only partially overlaps a sampling unit (Figure 1a; Nichols et al., 2009; Pavlacky et al., 2012). In the case of mobile species, their availability corresponds to the extent that an individual's home range or territory at least partially overlaps a sampling unit, which could be interpreted as a coverage probability or statistic (Nichols et al., 2009). In the case of sessile species, the organism may be unavailable for sampling at a ‘site’ (i.e. spatial replicate in our study design) if species presence is not constant within a site, which would be the case if there are microhabitats that are unoccupied within a site (Gray et al., 2013). Third, availability can be less than one when the species is present at a site but is not available for detection because the species is not eliciting a behaviour that makes it detectable (Figure 1a). For example, during a point count survey, a bird may be present within the radius of the observer conducting the survey, but the bird may be unavailable for detection if the bird is not actively singing during the survey (Figure 1a). Across all three scenarios, one or more processes can be operating to affect species availability during a survey. For instance, going back to the point count survey example, it is easy to image that bird availability can be subject to both partial overlap of the home range and sampling unit and lack of availability related to singing behaviour. Therefore, it is up to the biologist to critically think about the processes affecting the sampling process that may affect statistical inference.

Details are in the caption following the image
Graphical depiction of three ecological conditions which may lead to species unavailability (a) along with tree diagrams representing data generation, latent states, and associated parameters for the multi-scale occupancy model (b) and the single-scale occupancy model (c). In panel (a), species unavailability may result from (1) temporary emigration (e.g. salamander moving underground), (2a) for a mobile species, when the organism's home range partially overlaps a sampling unit, (2b) for a sessile organism, when the probability of occupancy is not uniform across a sampling unit (‘site’), or (3) when the species is not eliciting a behaviour that would make it available for detection (e.g. when a bird is present at a site but does not sing during a point count survey). In panels (b) and (c), Ψ is occupancy probability, θ is availability, and p is detection probability (see section 2. 1 Multi-scale occupancy model in the Methods for model explanation).

In real field situations, we expect species to be both unavailable and that their availability is non-random, leading to a correlation in detection across repeated surveys of a site. For example, in the case of the point count survey, the issue of species availability becomes important if the observer surveys a site and collects data for three sampling events in a single morning that a bird is inactive (i.e. the species is unavailable, and their availability is non-random). Alternatively, if the observer sampled a site on three different mornings, then we might expect that by random chance the bird is going to be active during some mornings and not others. This process of random availability then gets absorbed into the detection model (when using a single-scale occupancy model) and does not lead to excess heterogeneity. Said in a slightly different way, it is the relationship between the temporal scale at which the availability process operates and the temporal scale at which repeated sampling events occur, which is decisive in determining whether species availability needs to be explicitly considered in occupancy models.

Methods to account for species availability were originally developed for capture-mark-recapture models (e.g. Kendall & Nichols, 1995), but these methods have since been used to accommodate a number of other dependence structures in a variety of modelling frameworks, such as estimating species availability in multi-scale occupancy models (e.g. Aing et al., 2011; Mordecai et al., 2011), accounting for multiple sources of imperfect pathogen sampling (e.g. Colvin et al., 2015; McClintock et al., 2010), and accommodating for spatially nested sampling units (e.g. Chelgren et al., 2011; Nichols et al., 2008). Under each application, the manner in which data are collected dictates how parameters are interpreted. For example, if data are collected with spatial replicates at multiple scales (i.e. ponds within multiple refuges), then a multi-scale occupancy model can estimate occupancy probability at both the local (e.g. across ponds within a refuge) and regional (e.g. across refuges) spatial scales (Nichols et al., 2008). Alternatively, when data are collected with temporal replicates at multiple scales (i.e. secondary and tertiary surveys), then the multi-scale occupancy model can disentangle species availability and detection (Green et al., 2019; Kendall & White, 2009). And, in a third example, when disease ecologists collect multiple samples from a single individual and then perform multiple PCR assays per sample (DiRenzo et al., 2019; Mosher et al., 2017), then the multi-scale occupancy model allows an understanding of imperfect pathogen detection during different phases of the pathogen sampling process (i.e. collecting the sample in the field vs. analysing the sample in the lab). In each of these cases, the multi-scale occupancy model is applied to a unique dataset requiring thoughtful consideration of how to interpret the parameter estimates.

To obtain robust parameters estimates of biological interest (e.g. occupancy, colonization and extinction), biologists either design their monitoring programs to survey for species when they experience their highest availability or they run a power analysis for the multi-scale occupancy model. Power analyses can be very simple to very complicated, and provide insights related to ‘how much sampling is enough?’ (e.g. Bailey et al., 2007; MacKenzie & Royle, 2005). However, power analyses can be very time consuming and unpractical to run under time constraints, forcing biologists to make decisions related to allocating valuable resources with limited information.

Here, we answer the following three common questions posed by biologists during the design phase of a study: (Q1) what is a robust sampling design for the multi-scale occupancy model when there are a priori expectations of parameter estimates?, (Q2) what is a robust sampling design when we have no expectations of parameter estimates?, and (Q3) can a single-scale occupancy model with a random effects term adequately absorb the extra heterogeneity produced when availability is less than one and provide reliable estimates of occupancy probability?. To answer (Q1) and (Q2), we simulated data assuming that species availability was constant across sites (but less than one), and we used the multi-scale occupancy model to analyse the data. To answer (Q3), we simulated data under each of the three additional scenarios: species availability is heterogenous across sites, species availability is heterogenous across multi-year data, and species availability is correlated with their detection probability across multi-year data; and we analysed the simulated data using each of the following four models: (i) a constant single-scale occupancy model, (ii) a constant multi-scale occupancy model, (iii) a single-scale occupancy model with a random effects term on detection and (iv) a multi-scale occupancy model with a random effects term on availability. To adequately address (Q3), we compared the performance of the multi-scale and the single-scale occupancy models with and without random effects terms, thus, producing the list of four models.

We expected to find that a robust sampling design for the multi-scale occupancy model would include tradeoffs in the number of sites and surveys performed. For example, if more sites are surveyed, then fewer tertiary surveys are required to achieve a specified level of acceptable error. We also expected to find that the single-scale occupancy model would produce biased estimates of occupancy with low coverage across all simulated scenarios regardless of the model used to analyse the data and the true values of availability and detection probability, given the results from previous simulation studies examining violations of the closure assumption (e.g. Rota et al., 2009; Valente et al., 2017). In an effort to make our results more accessible to others looking to employ these methods and explore the results of our simulations further, we also provide an RShiny app as a companion to this paper https://gdirenzo.shinyapps.io/multi-scale-occ/. Our results serve as a guide to biologists looking to produce robust statistical inference on species distributions when availability is suspected to be variable and detection is imperfect.

2 METHODS

2.1 Multi-scale occupancy model

We formulated a single season multi-scale occupancy model as in Nichols et al. (2008), Aing et al. (2011) and Mordecai et al. (2011). The data for the multi-scale occupancy model consist of species detection/non-detection data collected from site i during secondary survey j and tertiary survey k. During secondary surveys, the site is closed to open population dynamics (i.e. birth, death, immigration and emigration) and provides an opportunity to estimate changes in availability (e.g. temporary emigration; Figure 1a), whereas the tertiary surveys are closed to changes in availability and open population dynamics (Green et al., 2019). A simple survey design that emulates this type of data collection would be to have multiple observers independently collect data (constituting the tertiary surveys) repeatedly over a few days (constituting the secondary surveys) over a number of sites (representing the sampling units). Note that the number of secondary surveys are performed on a per site basis (e.g. two secondary surveys per site), and that the number of tertiary surveys are performed on a per secondary survey basis (e.g. two tertiary surveys per secondary survey).

In the multi-scale occupancy model, first, we define the occupancy of site i as a Bernoulli trial, where
z i Bernoulli Ψ .

z is a latent state variable, and if site i is occupied, then zi = 1, and 0 otherwise. Ψ is defined as the occupancy probability (i.e. the probability that a site is occupied by the focal species).

Next, we consider the sampling process composed of two parts: (1) species availability and (2) species detectability. We define species availability at site i during secondary survey j as a Bernoulli trial, such that
w ij Bernoulli( θ * z i ) .

w is a latent state variable, and if the species occupies site i and is available during secondary survey j, then wij = 1, and 0 otherwise. 𝜃 is defined as the probability that the species is available for sampling given that the site is occupied. We multiply 𝜃 by zi because species are unavailable at sites where they do not occur. In this way, zi acts as an on and off switch to estimating 𝜃.

At last, we define species detectability at site i during secondary survey j and tertiary survey k as a Bernoulli trial, where
y ijk Bernoulli p * w ij .

y is the observed detection/non-detection data of the species at site i during secondary survey j and tertiary survey k. yijk = 1 if the species is detected at site i during secondary survey j and tertiary survey k, and 0 otherwise. p is defined as the probability a species is detected given that the site is occupied (zi = 1) and the species is available (wij = 1). Similarly, we multiply p by wij because the species cannot be detected at sites where the species does not occur and is unavailable.

2.2 Simulation settings

We developed a series of simulations to answer the three questions outlined in the introduction. We simulated data across a wide range of parameter values and study designs (i.e. number of sites, secondary surveys and tertiary surveys) to explore the performance of the multi-scale and single-scale occupancy models. In all cases, we assumed that observations were independent and that sites contained closed populations during secondary survey periods (i.e. no birth, death, immigration or emigration; Green et al., 2019). We also assumed independence and closure to both changes in availability and open population dynamics during tertiary surveys (Green et al., 2019; MacKenzie et al., 2002, 2003). For each of the parameters that we varied in our simulations, we selected parameters independently from a pre-specified range of values. To ensure even parameter coverage and that we sampled distributions evenly, we used a Latin hypercube sampler with function lhs() in package lhs (Carnell, 2019) in the program R (R Core Team, 2019). A Latin hypercube sampler is a method used to generate a near-random sample of values from multi-dimensional distributions. Traditional random sampling methods do not guarantee that a set of random numbers are an adequate representation of their covariance, whereas the orthogonal sampling underlying the Latin hypercube sampler does.

We analysed simulated datasets using a Bayesian approach with Markov chain Monte Carlo in the programs R (R Core Team, 2019) and JAGS (Plummer, 2003). We specified vague priors for all parameters on the logit scale using a normal distribution with mean 0 and precision 0.368 following Lunn et al. (2013). We initiated model runs with three chains, an adaption period of 10,000, a burning period of 5000, and thinning by 10. We used the function autojags() in package jagsUI (Kellner, 2016) to update the model until convergence (i.e. Ȓ < 1.1). The maximum allowed number of iterations was 1 × 106. Model runs that did not converge by 1 × 106 iterations were discarded. All simulations were run on the Yeti supercomputer provided by the Science Analytics and Synthesis (SAS) group at the U.S. Geological Survey Advanced Research Computing (USGS ARC).

In the next few sections, we provide more details about how data were simulated and processed to answer each question. A directory containing the information to reproduce all of the analyses, tables and figures is provided in Table S1.

2.3 (Q1) What is a robust sampling design for the multi-scale occupancy model when there are a priori expectations of parameter estimates?

To provide sampling design guidelines when there are a priori expectations of parameter estimates, we examined the accuracy (i.e. how close are mean parameter estimates of the model to the true parameter values?), estimated precision (i.e. how large is the 95% credible interval?), bias (e.g. what is the magnitude of parameter over- or under-estimation?), and coverage (e.g. what proportion of times does the true parameter value fall within the estimated 95% CI?) of the multi-scale occupancy model over a range of parameter values and sampling designs. We simulated parameter values on the logit scale using the following distributions: Ψ ~ Uniform(−3, 3), 𝜃 ~ Uniform(−3, 3), p ~ Uniform(−3, 3). These bounds represent a range of 0.05–0.95 on the probability scale. We chose discrete values for sampling design variables from distributions as follows: sites ~ Uniform(5, 500), secondary surveys ~ Uniform(2, 8), and tertiary surveys ~ Uniform(2, 8). All datasets were simulated assuming availability was less than one. We simulated 10,000 datasets and analysed them using the multi-scale occupancy model.

Although parameter values were chosen from continuous distributions, we assigned each simulated dataset to one of the eight discrete groupings depending on their parameter combinations (i.e. do parameters take on high [probability scale >0.60] or low [probability scale <0.40] values?) for the interpretation of results. Group assignment was determined by a combination of the true values of Ψ, 𝜃, and p (see Appendix S1 for more details).

Next, for each of the eight discrete groupings, we determined how the performance of the occupancy estimator was affected by sampling effort by fitting four post-hoc generalized linear models, one for each accuracy, precision, bias and coverage. In each of the four post-hoc models, we used a measure of accuracy, precision, bias, and coverage as the response variable. For accuracy, we used the log absolute error between the true occupancy value and the posterior mean occupancy estimates from each model run as: log(|truth – estimate|). For precision, we used the width of the occupancy estimate's 95% credible interval (CI; i.e. the upper 95% CI estimate minus the lower 95% CI estimate). For bias, we used the difference between the true occupancy value and the posterior mean occupancy estimate from each model run using: Bias = estimate truth $$ Bias= estimate- truth $$ . In this case, negative values represent model underestimates and positive values represent model overestimates. Finally, for assessing parameter coverage, we recorded a value of 1 for each simulated dataset when the true occupancy estimate fell within the 95% CI of the model run, and 0 otherwise. In each of the four post-hoc generalized linear models, we specified the log(number of sites), log(number of secondary surveys), and log(number of tertiary surveys) as the explanatory variables. We used a normal distribution and identity link function for accuracy and bias, and we used a binomial distribution and logit link function for precision and coverage (see Appendix S2, S3, S4 and S5 for more details).

Then, using the coefficient values obtained from each of the four post-hoc generalized linear models for each of the eight discrete groupings, we calculated the predicted average accuracy, precision, bias, and coverage under a variety of different sampling designs. Specifically, we varied the number of sites from 5 to 500, the number of secondary surveys from 2 to 8, and the number of tertiary surveys as either 2 or 4.

2.4 (Q2) What is a robust sampling design when we have no expectations of parameter estimates?

To provide general sampling design guidelines, we started by constructing a single post-hoc linear model fit to the same simulated datasets from (Q1) analysed with the multi-scale occupancy model. Here, we retained all simulated datasets, and we did not assign each simulated dataset to one of the eight discrete groupings as we did for (Q1). We fit a post-hoc linear model with the log absolute error of occupancy as the response variable (Appendix S2), and the log(number of sites), log(number of secondary surveys), and log(number of tertiary surveys) as fixed effects using the lm() function in R. From this post-hoc model, we used the resulting coefficient estimates to determine the sampling effort needed to achieve three thresholds of acceptable error for the occupancy estimate on the probability scale, representing a low (0.01), medium (0.05), and high value (0.10). To do this, we indicated the number of sites sampled as: 20, 60, 80, or 100; and, the number of secondary surveys as: 2, 3, or 4. For each of the 36 combinations (acceptable error, number of sites, number of secondary surveys), we then solved the following equation to determine the number of tertiary surveys required for sampling:
Acceptable erro r = α + β 1 * log # of sites + β 2 * log # of secondary surveys + β 3 * log # of tertiary surveys
Here, α $$ \alpha $$ is the intercept term, and each of the β $$ \beta $$ coefficients are slopes estimated from the post-hoc linear model described above. We rounded the number of tertiary surveys up to the nearest whole integer, and we replaced tertiary surveys less than two with a value of two since auxiliary information is required to estimate parameters 𝜃 and p.

At this point, we have solved the equation for the number of tertiary surveys needed to achieve an average acceptable level of error. Based on these values, we then calculated the expected width of the 95% CI, bias, and coverage for the occupancy estimate using similar methods described for fitting post-hoc generalized linear models for occupancy precision, bias, and coverage (Appendix S6).

2.5 (Q3) Can a single-scale occupancy model with a random effects term adequately absorb the extra heterogeneity produced when availability is less than one and provide reliable estimates of occupancy probability?

To investigate (Q3), we simulated and analysed the following number of datasets under each of the four scenarios:

Scenario 1 (n = 10,000): Species availability is constant across sites (but less than one).

Scenario 2 (n = 9,358): Species availability is heterogenous across sites.

Scenario 3 (n = 2,815): Species availability is heterogenous across multi-year data.

Scenario 4 (n = 5,942): Species availability is correlated to their detection probability across multi-year data.

Note that, for Scenario 1, we used the same 10,000 datasets that were generated to answer (Q1) and (Q2), and we generated between 2,800 and 9,360 for Scenarios 2–4 because of varying model time runs and wall time limits on the USGS Yeti Supercomputer.

Next, for each scenario except the first, we analysed the data using four different models:

(i) constant multi-scale occupancy model,

(ii) multi-scale occupancy model with a random-effects term on availability,

(iii) constant single-scale occupancy model and

(iv) single-scale occupancy model with a random-effects term on detection.

Note the formulation of the random-effects terms included in the models mimicked the way that data were simulated (e.g. if species availability was heterogenous across sites, then a site random-effects term was used). The first scenario was analysed using only models (i) and (iii). For simplicity, we refer to models (i) and (iii) as ‘constant’ models and models (ii) and (iv) as ‘random-effects’ models. For more details on how data were simulated and analysed, see Appendix S7.

Then, to compare the performance of each model in each scenario, we examined how well each model predicted true occupancy in terms of accuracy, precision, bias and coverage (Appendix S8). Given the large quantity of simulated data, we took an approach similar to answering (Q1), and we assigned each simulated dataset to one of the eight discrete groupings depending on the true values of Ψ, 𝜃 and p (Appendix S8). Then, as we did before, we calculated model performance metrics (i.e. accuracy, precision, bias and coverage), and we summarized mean and standard error values across model types, parameterizations and parameter combinations (Appendix S8).

3 RESULTS

3.1 (Q1) What is a robust sampling design for the multi-scale occupancy model when there are a priori expectations of parameter estimates?

Our simulations show that the ability of the multi-scale occupancy model to recover unbiased and precise occupancy estimates with high estimated coverage depends on the true parameter values of Ψ, 𝜃, and p and the amount of available data (Figures S1–S4). Parameter estimates generally had high accuracy (log absolute difference between model estimated mean and truth = −3.28 ± 0.01 [mean ± SE]), low bias (estimate – truth = 0.01 ± 0.001 logit units), and high coverage (0.95 ± 0.002; expected coverage for 95% CI is 0.95), but the occupancy estimates typically had low precision (mean width of 95% CI = 0.36 ± 0.002 probability scale).

We also found that mean accuracy and precision were the lowest when few sites were sampled and either occupancy or availability were low (Figure S1; ParamCombo's 3, 4, 5 and 6). In addition, we found that precision was influenced by the number of secondary and tertiary surveys (Figure S2), where a higher number of secondary and tertiary surveys led to greater precision. Last, mean bias and coverage were largely determined by whether occupancy and availability were high or low, and less sensitive to detection, number of surveys and number of sites (Figures S3 and S4).

3.2 (Q2) What is a robust sampling design when we have no expectations of parameter estimates?

We found that when sampling between 20 and 100 sites with two to four secondary surveys performed per site, then the observer should consider performing between two and 14 tertiary surveys per secondary survey to achieve between a 0.01 to 0.10 level of error (Table 1). With these proposed sampling designs, the expected average estimated coverage is close to expectation (0.93–0.96), the expected average width of 95% CIs is moderately wide (0.31–0.72), and the expected average bias is low (−0.008–0.022; Table 1).

TABLE 1. General guidelines on sampling design (i.e. number of sites, secondary surveys, and tertiary surveys) to achieve an acceptable level of error (i.e. absolute difference between the true value and model estimate) for occupancy probability using the multi-scale occupancy model. Note that the number of tertiary surveys is performed per secondary survey, and values were rounded up to the nearest whole number. Using the specified sampling designs, we, then, calculated the expected average width of the 95% CI, bias, and coverage.
Acceptable level of error Number of sites Number of secondary surveys Number of tertiary surveys Expected average width of 95% CI (probability scale) Expected average bias (probability scale) Expected average coverage
0.01 20 2 14 0.599 −0.008 0.935
0.01 20 3 13 0.537 −0.006 0.938
0.01 20 4 13 0.488 −0.006 0.939
0.01 60 2 12 0.465 −0.004 0.935
0.01 60 3 11 0.403 −0.001 0.937
0.01 60 4 10 0.364 0.001 0.94
0.01 80 2 11 0.433 −0.002 0.935
0.01 80 3 10 0.373 0.001 0.938
0.01 80 4 10 0.329 0.001 0.939
0.01 100 2 11 0.404 −0.001 0.934
0.01 100 3 10 0.346 0.001 0.937
0.01 100 4 9 0.31 0.003 0.94
0.05 20 2 5 0.67 0.004 0.946
0.05 20 3 4 0.623 0.008 0.95
0.05 20 4 4 0.575 0.009 0.951
0.05 60 2 3 0.568 0.014 0.949
0.05 60 3 2 0.53 0.02 0.954
0.05 60 4 2 0.481 0.021 0.955
0.05 80 2 3 0.53 0.015 0.949
0.05 80 3 2 0.491 0.021 0.954
0.05 80 4 2 0.442 0.022 0.955
0.05 100 2 2 0.53 0.02 0.952
0.05 100 3 2 0.461 0.021 0.953
0.05 100 4 2 0.413 0.022 0.954
0.1 20 2 2 0.728 0.016 0.955
0.1 20 3 2 0.67 0.017 0.956
0.1 20 4 2 0.625 0.018 0.957
0.1 60 2 2 0.598 0.019 0.953
0.1 60 3 2 0.53 0.02 0.954
0.1 60 4 2 0.481 0.021 0.955
0.1 80 2 2 0.56 0.02 0.952
0.1 80 3 2 0.491 0.021 0.954
0.1 80 4 2 0.442 0.022 0.955
0.1 100 2 2 0.53 0.02 0.952
0.1 100 3 2 0.461 0.021 0.953
0.1 100 4 2 0.413 0.022 0.954

3.3 (Q3) Can a single-scale occupancy model with a random effects term adequately absorb the extra heterogeneity produced when availability is less than one and provide reliable estimates of occupancy probability?

We found that the biggest differences in estimated occupancy accuracy, precision, bias and coverage between the multi-scale occupancy and single-scale occupancy models occurred when availability was low, regardless of the true occupancy or detection values (Figures 2-5). Results were qualitatively similar across detection levels (Figures S5S8). There were a few instances when true occupancy and availability were low that the multi-scale occupancy model and the single-scale occupancy model performed similarly; but this behaviour was only detected for parameter accuracy (Figure 2b).

Details are in the caption following the image
Comparison of model performance related to accuracy of occupancy estimates for single-scale versus multi-scale models, constant versus random parameterizations, and four simulated scenarios. Panel (a) corresponds to all high occupancy scenarios (>0.60 on probability scale), and panel (b) corresponds to all low occupancy scenarios (<0.40 on probability scale). *Highlight the largest differences between single-scale and multi-scale occupancy model performance. ‘NR’ represents models/parameterizations/scenarios not run. See methods for value cutoffs of parameter combinations along x-axis.
Details are in the caption following the image
Comparison of model performance related to precision of occupancy estimates for single-scale versus multi-scale occupancy models, constant versus random parameterizations, and four simulated scenarios. Panel (a) corresponds to all high occupancy scenarios (>0.60 on probability scale), and panel (b) corresponds to all low occupancy scenarios (<0.40 on probability scale). *Highlight the largest differences between single-scale and multi-scale occupancy model performance. ‘NR’ represents models/parameterizations/scenarios not run. See methods for value cutoffs of parameter combinations along x-axis.
Details are in the caption following the image
Comparison of model performance related to bias of occupancy estimates for single-scale versus multi-scale occupancy models, constant versus random parameterizations, and four simulated scenarios. Panel (a) corresponds to all high occupancy scenarios (>0.60 on probability scale), and panel (b) corresponds to all low occupancy scenarios (<0.40 on probability scale). *Highlight the largest differences between single-scale and multi-scale occupancy model performance. ‘NR’ represents models/parameterizations/scenarios not run. See methods for value cutoffs of parameter combinations along x-axis.
Details are in the caption following the image
Comparison of model performance related to coverage of occupancy estimates for single-scale versus multi-scale occupancy models, constant versus random parameterizations, and four simulated scenarios. Panel (a) corresponds to all high occupancy scenarios (>0.60 on probability scale), and panel (b) corresponds to all low occupancy scenarios (<0.40 on probability scale). *Highlight the largest differences between single-scale and multi-scale occupancy model performance. ‘NR’ represents models/parameterizations/scenarios not run. See methods for value cutoffs of parameter combinations along x-axis.

Although the single-scale occupancy model produced more precise occupancy estimates (Figure 3), the single-scale occupancy model tended to underestimate occupancy probability (Figure 4) and experienced low estimated coverage (Figure 5). The single-scale occupancy model underestimated occupancy probability by as much as 0.40 on the probability scale and experienced an average bias of −0.10 ± 0.02 (mean ± SE; Figure 4) with an average coverage of 0.56 ± 0.04 (mean ± SE; Figure 5). Interestingly, when true availability was high (>0.60), then the single-scale occupancy model produced comparable occupancy probability estimates to the multi-scale occupancy model in terms of accuracy, precision, bias, and coverage (Figures 2-5).

Finally, we found that adding a random-effects term to the single-scale occupancy model does not adequately absorb the extra heterogeneity produced by the availability process, where the single-scale occupancy model performed similarly under each scenario using either a constant or random-effects model parameterization.

3.4 Rshiny app

In an effort to encourage the further exploration of our simulation results, we created an RShiny app to accompany this paper https://gdirenzo.shinyapps.io/multi-scale-occ/. The RShiny app parallels the structure and information presented in this paper, but it also provides more practical guidance for those looking for survey design assistance.

4 DISCUSSION

Using simulations, we show that under a range of scenarios when species availability is low, the single-scale occupancy model consistently underestimates occupancy probability, producing overly precise occupancy estimates with low coverage. This likely occurs because when species have a consistently low availability, they have a higher probability of not being observed across multiple secondary surveys, and because the single-scale occupancy model does not explicitly accommodate availability, it routinely underestimates occupancy. Other studies have documented the same pattern of biased occupancy estimates using the single-scale occupancy model when either detection is low (Emmet et al., 2021; MacKenzie et al., 2002) or when heterogeneity in detection is not accounted for (Otto et al., 2013). This same negative bias also occurs in mark-recapture estimates of abundance when individual heterogeneity in detection occurs with the same underlying cause (Otis et al., 1978). The underestimation of the number of occupied sites or the number of individuals in a population occurs because heterogeneity leads to more all zero encounter histories than expected at random (i.e. the species is never detected across all repeated surveys at a sites that is truly occupied or an individual is never captured across all the of the capture events). We also show that adding a random-effects term to the single-scale occupancy model does not adequately absorb the extra heterogeneity produced by the availability process, regardless of true parameter values or simulation scenario. Accommodating the availability process by using a multi-scale occupancy model is useful to improve parameter estimation and ecological inference, but comes at an additional cost, requiring extra data collection. In the following section, we explore the practical application of the multi-scale occupancy model.

4.1 How can biologists adjust their sampling design to accommodate the multi-scale occupancy model?

The design phase of a study is the most appropriate place to consider accounting for ecological and sampling processes that influence robust statistical inference. Note that there is no free lunch, and to accommodate availability, for some parameter combinations and desired precision, practitioners will need many tertiary surveys (Table 1). There are some potential ways around this, such as in the definition of a ‘site’ (e.g. adjusting the size and spacing of the spatial subunits according to the expected size of a home range or species movement), the timing of surveys (e.g. targeting periods when species availability is expected to be constant or relatively high), or if the researcher can accept reduced performance in terms of occupancy accuracy, precision, bias, and coverage when availability is heterogeneous and not explicitly modelled.

One fairly easy modification for some sampling designs that would allow for the use of the multi-scale occupancy model is to have multiple observers (independently and simultaneously) conduct repeated surveys at sites. In this way, detection probability can be calculated from the repeated surveys (if the surveys are conducted over a period of time where the site is closed to both open population dynamics and changes in availability; referred to as tertiary surveys), and availability can be calculated from the multiple observers (if the surveys are conducted over a period of time where the site is closed to open population dynamics and captures changes in availability; referred to as secondary surveys). We urge readers to carefully think about the timescales over which open population dynamics and changes in availability occur for their study system when designing surveys.

There may be cases where researchers are interested in accounting for species availability, but one of several scenarios may occur: the data are already collected, the researcher has no control over the sampling design, or the researcher cannot easily adjust their sampling design for the additional data collection required when using a multi-scale occupancy model. In these cases, we point the reader to two approaches. First, the staggered entry model by Kendall et al. (2013) uses the same sampling design of single-scale occupancy model (i.e. sites are repeatedly surveyed; no tertiary surveys required), but this model relaxes the closure assumption within a season by permitting staggered entry and exit times for the species of interest at each site. We note, though, that these models only allow a single entry and exit event, which may not be appropriate for some species or study systems. Second, to accommodate the heterogeneity in species unavailability, it might be worth pursuing the use of a dynamic single-scale occupancy model and shortening the ‘seasons’ to sample periods to account for the changes in species availability through time within a season (Otto et al., 2013). This approach changes the meaning of the dynamic parameters and introduces bias in their estimation (see Valente et al., 2017 for a discussion on the impact of temporary emigration on the estimation of dynamic parameters).

4.2 How do our results compare to previous assessments of closure assumption violations using the single-scale occupancy model?

Overall, our results are consistent with the patterns documented by others exploring the effects of violating the closure assumption in the single-scale occupancy model (e.g. MacKenzie et al., 2002; Mordecai et al., 2011; Rota et al., 2009; Valente et al., 2017). As mentioned before, when availability is less than one and not accounted for, then the detection probability is the product of the probabilities of the detection and availability in the case of the single-scale occupancy model. In this paper, we only evaluate the impacts of violating the closure assumption around availability (leaving detection fixed as we vary availability). Since detection was fixed, the product of detection and availability essentially mimics the availability process in the single-scale occupancy model. Future explorations of the single-scale occupancy model should consider assessing violations of the closure assumption around both detection and availability simultaneously to tease apart their contribution to model performance. This should be done for situations expected in real field systems to make the results most useful for assessing the potential for misleading inference in these cases.

4.3 What is the past and future of the multi-scale occupancy model?

We foresee the multi-scale occupancy model being used widely across ecological disciplines and accommodating different types of data. Future applications of the multi-scale occupancy model may include spatial distribution modelling (Jiménez et al., 2016), estimating species-environment relationships (Harju & Cambrin, 2019), species co-occurrence (Green et al., 2020), dynamic species distributions (Green et al., 2019), pathogen detection (Abad-Franch, 2020), estimating species richness or biodiversity metrics (Zamora-Marín et al., 2021), species distributions and adaptive sampling of eDNA (Davis et al., 2018), predator–prey dynamics (Rehman et al., 2021) and estimating nested networks (e.g. community of microbes on a community of hosts), among other applications.

Another ripe area of research inquiry in this arena is exploring the development of a multi-scale time-to-detection model (for more on time-to-detection models, see Garrard et al., 2008; Halstead et al., 2018, 2021). Time-to-detection models estimate detection probabilities based on a single site visit by one observer, rather than based on multiple surveys at a single site, which is the case for single-scale occupancy models. The multi-scale time-to-detection model would combine these two approaches: use the time between the initiation of a survey and the time at which the first individual of a species is detected to estimate detection rate and use multiple surveys (conducted over a period of time where the site is closed to open population dynamics and captures changes in availability) to estimate availability. We note though that time-to-detection models perform well for widespread, common species (i.e. high occupancy probability) with high detection probabilities (Halstead et al., 2021), so a multi-scale time-to-detection model may also only perform well with species with high availability. To our knowledge, such a model does not exist but may be worth further study if it can accommodate the availability process with two or a few repeated surveys per site.

5 CONCLUSIONS

We show that unaccounted for low species availability can lead to biased estimates of occupancy probability when using the single-scale occupancy model. One way that ecologists can account for heterogeneity in species availability is by using the multi-scale occupancy model, where sites are repeatedly surveyed over a period of time that is open to changes in availability (but not open population dynamics) AND survey sites over a period of time that is closed to changes in availability and open population dynamics. Collectively, our results show that future users of the multi-scale occupancy model should keep in mind tradeoffs between the number of sites and surveys, and that species availability can impact estimated occupancy probabilities, especially when availability is low. However, purely random changes in availability that lead to unaccounted for availability will simply be absorbed into the detection part of the single-scale occupancy model and the occupancy parameter (Ψ) should be interpreted as ‘habitat use’. Implications of unaccounted for availability in statistical inference has broad applications across ecological disciplines.

ACKNOWLEDGEMENTS

We thank L. M. Browne, B. Halstead, M. Kéry and an anonymous reviewer for their comments and suggestions on previous versions of the manuscript. Funding was provided by the USGS Amphibian Research Monitoring Initiative (ARMI). Any use of trade, firm or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government. We thank the USGS Advanced Research Computing, USGS Yeti Supercomputer: U.S. Geological survey (https://doi.org/10.5066/F7D798MJ). This is ARMI contribution #835.

    CONFLICT OF INTEREST

    We have no conflicts of interest.

    AUTHORS' CONTRIBUTIONS

    G.V.D. contributed to project development, wrote the model, simulated and analysed data, and wrote the first draft of the paper; D.A.W.M. contributed to project and model development; E.H.C.G. contributed to project and model development. All co-authors edited the manuscript.

    PEER REVIEW

    The peer review history for this article is available at https://publons.com/publon/10.1111/2041-210X.13881.

    DATA AVAILABILITY STATEMENT

    All data used in this paper are available at the Dryad Data repository (DiRenzo et al. 2022a): https://doi.org/10.5061/dryad.fxpnvx0rv. All code used in this paper are available at the Zenodo repository (DiRenzo et al., 2022b): https://doi.org/10.5281/zenodo.6214643. The Rshiny app that accompanies this manuscript can be accessed using https://gdirenzo.shinyapps.io/multi-scale-occ/.